Articles tagged emacs at null program

A Makefile for Emacs Packages

2020-01-22T02:54:41Z

Each of my Emacs packages has a Makefile to byte-compile all source files, run the tests, build a package file, and, in some cases, run the package in an interactive, temporary, isolated Emacs instance. These portable Makefiles have a similar structure and follow the same conventions. It would require more thought and feedback before I’d try to make it a standard, but these are conventions I’d like to see in other package Makefiles.

Here’s an incomplete list of examples:

You should make a habit of compiling your Emacs Lisp files even if you don’t think you need the performance. The byte-compiler, while dumb, does static analysis and may spot bugs and other issues early.

First things first: Every portable Makefile starts with a special target, .POSIX, to request standard behavior. This is followed by macro definitions. When compiling a C program, the CC macro is the name of the compiler. Analogously, when compiling Emacs packages the EMACS macro is the name of the Emacs program.

.POSIX:
EMACS = emacs

Users can now override the macro to specify alternate Emacs binaries. I use this all the time to test my packages under different versions of Emacs.

$ make clean
$ make EMACS=emacs-24.3 check
$ make clean
$ make EMACS=emacs-25.1 check

Note: It’s common to use ?= assignment here, but that is both non-standard and unnecessary. If you want to override macro definitions from the environment, use the -e option:

$ export EMACS=emacs-24.3
$ make -e

The first non-special target in the Makefile is the default target. For Emacs packages, this target should byte-compile all the source files, including tests. List the byte-compiled file names as the target dependencies:

compile: foo.elc foo-test.elc

Now for the tedious part: Define the dependencies between your different source files. It would be nice to automate this part somehow, but fortunately most packages just aren’t that complicated. You do not need to list trivial dependencies — i.e. mapping each .el file to its .elc file — since make will figure that out on its own.

Since foo-test.elc relies on foo.elc — it’s testing this file after all — the relationship must be indicated to make. For single file packages (one package file, one test file), this is all that’s needed:

foo-test.elc: foo.elc

I call my testing targets “check” and this target must depend on the byte-compiled files containing tests. It will transiently depend on the other package source files because of the previous section.

check: foo-test.elc
    $(EMACS) -Q --batch -L . -l foo-test.elc -f ert-run-tests-batch

The -Q option runs Emacs with “minimum customizations.” The -L . option puts the current directory in the load path so that (require 'foo) will work. Finally it loads the file containing the tests and instructs ERT to run all defined tests.

A good build can clean up after itself:

clean:
    rm -f foo.elc foo-test.elc

Finally we need one more thing to tie it all together: an inference rule to teach make how to compile .elc files from .el files.

.SUFFIXES: .el .elc
.el.elc:
    $(EMACS) -Q --batch -L . -f batch-byte-compile $<

This is similar to the “check” target, but compiles a source file instead of running tests.

For simple, single source file packages, this is all you need!

Complex packages

My most complex package is Elfeed which has 10 source files and 4 test files. It also includes a target to build a package file, which I would upload to Marmalade when it was still functioning. I did a few extra things to keep this tidy.

First, I define the package version in the Makefile:

VERSION = 1.2.3

It would be nice to grab this information from a reliable place (Git tag, source file, etc.), but I never found a reliable and satisfactory way to do this. Simple wins.

To avoid repeating myself, I list the source files in a macro as well:

EL   = foo-a.el foo-b.el foo-c.el
DOC  = README.md
TEST = foo-test.el

These will still need to have all their interdependencies individually defined for make. For example, if C depends on both A and B, but neither A nor B depend on each other, this is all you’d need:

foo-c.elc: foo-a.elc foo-b.elc

Done correctly you can perform parallel builds with the non-standard but common -j make option. This is pretty nice since Emacs can’t do parallel builds itself.

I use the file list macros in the “compile” and “check” targets:

compile: $(EL:.el=.elc) $(TEST:.el=.elc)
test: $(TEST:.el=.elc)

The “package” target copies everything under a directory and tars it up. The directory is removed first, if it exists, so that any potenntial leftover garbage from doesn’t get included.

package: foo-$(VERSION).tar
foo-$(VERSION).tar: $(EL) $(DOC)
    rm -rf foo-$(VERSION)/
    mkdir foo-$(VERSION)/
    cp $(EL) $(DOC) foo-$(VERSION)/
    tar cf $@ foo-$(VERSION)/
    rm -rf foo-$(VERSION)/

In Elfeed, the target to test in an interactive, temporary Emacs instance is called “virtual”. In Skewer it’s called “run”. The name of the target and the specific rules will depend on the package, should you even want this target at all. It’s handy to have the option test without my own configuration contaminating Emacs, and vice versa. When people report issues, I can also direct them to reproduce their issue in the clean environment.

Here’s what a simple “run” target might look like:

run: $(EL:.el=.elc)
    $(EMACS) -Q -L . -l foo-c.elc -f foo-mode

Make is not really designed to run interactive programs like this, but it works in practice.

Dependencies

What about packages with dependencies? I’ve used Cask in the past but was never satisfied, especially when integrating it into a Makefile. So, again, I’ve opted for the dumb-but-reliable option: request that dependencies are cloned in adjacent directories matching the dependency’s package name. For example, the EmacSQL Makefile header:

# Clone the dependencies of this package in sibling directories:
#     $ git clone https://github.com/cbbrowne/pg.el ../pg

I also define a new “linker flags” macro, LDFLAGS. Like with EMACS, this lets users override it if needed:

LDFLAGS = -L ../pg

Everywhere I use -L . I also include $(LDFLAGS). For example, in the inference rule:

.SUFFIXES: .el .elc
.el.elc:
    $(EMACS) -Q --batch -L . $(LDFLAGS) -f batch-byte-compile $<

If the dependencies follow these conventions, then these can also be compiled in a recursive way with little effort:

$ make -C ../pg

I’m not completely satisfied with this solution, particularly since it’s an odd burden on anyone using the Makefile, but it’s worked well enough for my needs. This is when I wish Emacs had distributed package management.

Efficient Alias of a Built-In Emacs Lisp Function

2019-12-10T02:32:04Z

Suppose you don’t like the names car and cdr, the traditional identifiers for two halves of a lisp cons cell. This is misguided. A cons is really just a 2-tuple, and the halves don’t have any particular meaning on their own, even as “head” and “tail.” However, maybe this is really important to you so you want to do it anyway. What’s the best way to go about it?

defalias

Emacs Lisp has a built-in function just for this, defalias, which is the obvious choice.

(defalias 'car-alias #'car)

The car built-in function is so fundamental to the language that it gets its own byte-code opcode. When you call car in your code, the byte-compiler doesn’t generate a function call, but instead uses a single instruction. For example, here’s an add function that sums the car of its two arguments. I’ve followed the definition with its disassembly (Emacs 26.3, lexical scope):

(defun add (a b)
  (+ (car a) (car b)))
;; 0       stack-ref 1
;; 1       car
;; 2       stack-ref 1
;; 3       car
;; 4       plus
;; 5       return

There are zero function calls because of the dedicated car opcode, and it has the optimal six byte-code instructions.

The problem with defalias is that the definition is permitted change — or be advised — and that robs the byte-compiler of optimization opportunities. It’s a constraint. When the byte-code compiler sees car-alias, it must emit a function call:

(defun add-alias (a b)
  (+ (car-alias a) (car-alias b)))
;; 0       constant  car-alias
;; 1       stack-ref 2
;; 2       call      1
;; 3       constant  car-alias
;; 4       stack-ref 2
;; 5       call      1
;; 6       plus
;; 7       return

This has two function calls and eight byte-code instructions. Those function calls are significantly more expensive than a car instruction, which will show in the benchmark later.

defsubst

An alternative is defsubst, an inlined function definition, which will inline an actual car. The semantics for defsubst are, like macros, explicit that re-definitions may not affect previous uses, so the constraint is gone. Unfortunately the byte-code compiler is pretty dumb, and does a poor job inlining car-subst.

(defsubst car-subst (x)
  (car x))

(defun add-subst (a b)
  (+ (car-subst a) (car-subst b)))
;; 0       stack-ref 1
;; 1       dup
;; 2       car
;; 3       stack-set 1
;; 5       stack-ref 1
;; 6       dup
;; 7       car
;; 8       stack-set 1
;; 10      plus
;; 11      return

There are zero function calls and ten byte-code instructions. The car opcode is in use, but there are five unnecessary instructions. This is still faster than making the function calls, though. If the byte-code compiler was just a little smarter and could compile this to the ideal case, then this would be the end of the discussion.

cl-first

The built-in cl-lib package has a cl-first alias for car. This was written by someone with intimate knowledge of Emacs Lisp, so how how well did they do?

(require 'cl-lib)

(defun add-cl-first (a b)
  (+ (cl-first a) (cl-first b)))
;; 0       stack-ref 1
;; 1       car
;; 2       stack-ref 1
;; 3       car
;; 4       plus
;; 5       return

It’s just like plain old car! How did they manage this? By using a byte-compiler hint:

(defalias 'cl-first 'car)
(put 'cl-first 'byte-optimizer 'byte-compile-inline-expand)

They used defalias, but they also manually told the byte-compiler to inline the definition like defsubst. In fact, defsubst expands to an expression that sets byte-compile-inline-expand, but, as seen above, the inline function overhead gets inlined and doesn’t get eliminated.

Benchmark

So how do the alternatives perform? (benchmark source)

add           (0.594811299 0 0.0)
add-alias     (1.232037132 0 0.0)
add-subst     (0.700044324 0 0.0)
add-cl-first  (0.58332882 0 0.0)

(The car of the list is the running time.) Since add and add-cl-first have the same byte-codes, we shouldn’t, and didn’t, see a significant difference. The simple use of defalias doubles the running time, and using defsubst is about 15% slower.

On-the-fly Linear Congruential Generator Using Emacs Calc

2019-11-19T01:17:50Z

I regularly make throwaway “projects” and do a surprising amount of programming in /tmp. For Emacs Lisp, the equivalent is the *scratch* buffer. These are places where I can make a mess, and the mess usually gets cleaned up before it becomes a problem. A lot of my established projects (ex.) start out in volatile storage and only graduate to more permanent storage once the concept has proven itself.

Throughout my whole career, this sort of throwaway experimentation has been an important part of my personal growth, and I try to encourage it in others. Even if the idea I’m trying doesn’t pan out, I usually learn something new, and occasionally it translates into an article here.

I also enjoy small programming challenges. One of the most abused tools in my mental toolbox is the Monte Carlo method, and I readily apply it to solve toy problems. Even beyond this, random number generators are frequently a useful tool (1, 2), so I find myself reaching for one all the time.

Nearly every programming language comes with a pseudo-random number generation function or library. Unfortunately the language’s standard PRNG is usually a poor choice (C, C++, C#, Go). It’s probably mediocre quality, slower than it needs to be (also), lacks reliable semantics or behavior between implementations, or is missing some other property I want. So I’ve long been a fan of BYOPRNG: Bring Your Own Pseudo-random Number Generator. Just embed a generator with the desired properties directly into the program. The best non-cryptographic PRNGs today are tiny and exceptionally friendly to embedding. Though, depending on what you’re doing, you might need to be creative about seeding.

Crafting a PRNG

On occasion I don’t have an established, embeddable PRNG in reach, and I have yet to commit xoshiro256** to memory. Or maybe I want to use a totally unique PRNG for a particular project. In these cases I make one up. With just a bit of know-how it’s not too difficult.

Probably the easiest decent PRNG to code from scratch is the venerable Linear Congruential Generator (LCG). It’s a simple recurrence relation:

x[1] = (x[0] * A + C) % M

That’s trivial to remember once you know the details. You only need to choose appropriate values for A, C, and M. Done correctly, it will be a full-period generator — a generator that visits a permutation of each of the numbers between 0 and M - 1. The seed — the value of x[0] — is chooses a starting position in this (looping) permutation.

M has a natural, obvious choice: a power of two matching the range of operands, such as 2^32 or 2^64. With this the modulo operation is free as a natural side effect of the computer architecture.

Choosing C also isn’t difficult. It must be co-prime with M, and since M is a power of two, any odd number is valid. Even 1. In theory choosing a small value like 1 is faster since the compiler won’t need to embed a large integer in the code, but this difference doesn’t show up in any micro-benchmarks I tried. If you want a cool, unique generator, then choose a large random integer. More on that below.

The tricky value is A, and getting it right is the linchpin of the whole LCG. It must be coprime with M (i.e. not even), and, for a full-period generator, A-1 must be divisible by four. For better results, A-1 should not be divisible by 8. A good choice is a prime number that satisfies these properties.

If your operands are 64-bit integers, or larger, how are you going to generate a prime number?

Primes from Emacs Calc

Emacs Calc can solve this problem. I’ve noted before how featureful it is. It has arbitrary precision, random number generation, and primality testing. It’s everything we need to choose A. (In fact, this is nearly identical to the process I used to implement RSA.) For this example I’m going to generate a 64-bit LCG for the C programming language, but it’s easy to use whatever width you like and mostly whatever language you like. If you wanted a minimal standard 128-bit LCG, this will still work.

Start by opening up Calc with M-x calc, then:

Push 2 on the stack
Push 64 on the stack
Press ^, computing 2^64 and pushing it on the stack
Press k r to generate a random number in this range
Press d r 16 to switch to hexadecimal display
Press k n to find the next prime following the random value
Repeat step 6 until you get a number that ends with 5 or D
Press k p a few times to avoid false positives.

What’s left on the stack is your A! If you want a random value for C, you can follow a similar process. Heck, make it prime, too!

The reason for using hexadecimal (step 5) and looking for 5 or D (step 7) is that such numbers satisfy both of the important properties for A-1.

Calc doesn’t try to factor your random integer. Instead it uses the Miller–Rabin primality test, a probabilistic test that, itself, requires random numbers. It has false positives but no false negatives. The false positives can be mitigated by repeating the test multiple times, hence step 8.

Trying this all out right now, I got this implementation (in C):

uint64_t lcg1(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    return s;
}

However, we can still do a little better. Outputting the entire state doesn’t have great results, so instead it’s better to create a truncated LCG and only return some portion of the most significant bits.

uint32_t lcg2(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    return s >> 32;
}

This won’t quite pass BigCrush in 64-bit form, but the results are pretty reasonable for most purposes.

But we can still do better without needing to remember much more than this.

Appending permutation

A Permuted Congruential Generator (PCG) is really just a truncated LCG with a permutation applied to its output. Like LCGs themselves, there are arbitrarily many variations. The “official” implementation has a data-dependent shift, for which I can never remember the details. Fortunately a couple of simple, easy to remember transformations is sufficient. Basically anything I used while prospecting for hash functions. I love xorshifts, so lets add one of those:

uint32_t pcg1(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    uint32_t r = s >> 32;
    r ^= r >> 16;
    return r;
}

This is a big improvement, but it still fails one BigCrush test. As they say, when xorshift isn’t enough, use xorshift-multiply! Below I generated a 32-bit prime for the multiply, but any odd integer is a valid permutation.

uint32_t pcg2(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    uint32_t r = s >> 32;
    r ^= r >> 16;
    r *= UINT32_C(0x60857ba9);
    return r;
}

This passes BigCrush, and I can reliably build a new one entirely from scratch using Calc any time I need it.

Bonus: Adapting to other languages

Sometimes it’s not so straightforward to adapt this technique to other languages. For example, JavaScript has limited support for 32-bit integer operations (enough for a poor 32-bit LCG) and no 64-bit integer operations. Though BigInt is now a thing, and should make a great 96- or 128-bit LCG easy to build.

function lcg(seed) {
    let s = BigInt(seed);
    return function() {
        s *= 0xef725caa331524261b9646cdn;
        s += 0x213734f2c0c27c292d814385n;
        s &= 0xffffffffffffffffffffffffn;
        return Number(s >> 64n);
    }
}

Java doesn’t have unsigned integers, so how could you build the above PCG in Java? Easy! First, remember is that Java has two’s complement semantics, including wrap around, and that two’s complement doesn’t care about unsigned or signed for multiplication (or addition, or subtraction). The result is identical. Second, the oft-forgotten >>> operator does an unsigned right shift. With these two tips:

long s = 0;

int pcg2() {
    s = s*0x7c3c3267d015ceb5L + 0x24bd2d95276253a9L;
    int r = (int)(s >>> 32);
    r ^= r >>> 16;
    r *= 0x60857ba9;
    return r;
}

So, in addition to the Calc step list above, you may need to know some of the finer details of your target language.

UTF-8 String Indexing Strategies

2019-05-29T21:52:06Z

This article was discussed on Hacker News.

When designing or, in some cases, implementing a programming language with built-in support for Unicode strings, an important decision must be made about how to represent or encode those strings in memory. Not all representations are equal, and there are trade-offs between different choices.

One issue to consider is that strings typically feature random access indexing of code points with a time complexity resembling constant time (O(1)). However, not all string representations actually support this well. Strings using variable length encoding, such as UTF-8 or UTF-16, have O(n) time complexity indexing, ignoring special cases (discussed below). The most obvious choice to achieve O(1) time complexity — an array of 32-bit values, as in UCS-4 — makes very inefficient use of memory, especially with typical strings.

Despite this, UTF-8 is still chosen in a number of programming languages, or at least in their implementations. In this article I’ll discuss three examples — Emacs Lisp, Julia, and Go — and how each takes a slightly different approach.

Emacs Lisp

Emacs Lisp has two different types of strings that generally can be used interchangeably: unibyte and multibyte. In fact, the difference between them is so subtle that I bet that most people writing Emacs Lisp don’t even realize there are two kinds of strings.

Emacs Lisp uses UTF-8 internally to encode all “multibyte” strings and buffers. To fully support arbitrary sequences of bytes in the files being edited, Emacs uses its own extension of Unicode to precisely and unambiguously represent raw bytes intermixed with text. Any arbitrary sequence of bytes can be decoded into Emacs’ internal representation, then losslessly re-encoded back into the exact same sequence of bytes.

Unibyte strings and buffers are really just byte-strings. In practice, they’re essentially ISO/IEC 8859-1, a.k.a. Latin-1. It’s a Unicode string where all code points are below 256. Emacs prefers the smallest and simplest string representation when possible, similar to CPython 3.3+.

(multibyte-string-p "hello")
;; => nil

(multibyte-string-p "π ≈ 3.14")
;; => t

Emacs Lisp strings are mutable, and therein lies the kicker: As soon as you insert a code point above 255, Emacs quietly converts the string to multibyte.

(defvar fish "fish")

(multibyte-string-p fish)
;; => nil

(setf (aref fish 2) ?ŝ
      (aref fish 3) ?o)

fish
;; => "fiŝo"

(multibyte-string-p fish)
;; => t

Constant time indexing into unibyte strings is straightforward, and Emacs does the obvious thing when indexing into unibyte strings. It helps that most strings in Emacs are probably unibyte, even when the user isn’t working in English.

Most buffers are multibyte, even if those buffers are generally just ASCII. Since Emacs uses gap buffers it generally doesn’t matter: Nearly all accesses are tightly clustered around the point, so O(n) indexing doesn’t often matter.

That leaves multibyte strings. Consider these idioms for iterating across a string in Emacs Lisp:

(dotimes (i (length string))
  (let ((c (aref string i)))
    ...))

(cl-loop for c being the elements of string
         ...)

The latter expands into essentially the same as the former: An incrementing index that uses aref to index to that code point. So is iterating over a multibyte string — a common operation — an O(n^2) operation?

The good news is that, at least in this case, no! It’s essentially just as efficient as iterating over a unibyte string. Before going over why, consider this little puzzle. Here’s a little string comparison function that compares two strings a code point at a time, returning their first difference:

(defun compare (string-a string-b)
  (cl-loop for a being the elements of string-a
           for b being the elements of string-b
           unless (eql a b)
           return (cons a b)))

Let’s examine benchmarks with some long strings (100,000 code points):

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (compare a b)))
;; => (0.012568031 0 0.0)

With using two, zeroed unibyte strings it takes 13ms. How about changing the last code point in one of them to 256, converting it to a multibyte string:

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (setf (aref a (1- (length a))) 256)
      (compare a b)))
;; => (0.012680513 0 0.0)

Same running time, so that multibyte string cost nothing more to iterate across. Let’s try making them both multibyte:

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (setf (aref a (1- (length a))) 256
            (aref b (1- (length b))) 256)
      (compare a b)))
;; => (2.327959762 0 0.0)

That took 2.3 seconds: about 2000x longer to run! Iterating over two multibyte strings concurrently seems to have broken an optimization. Can you reason about what’s happened?

To avoid the O(n) cost on this common indexing operating, Emacs keeps a “bookmark” for the last indexing location into a multibyte string. If the next access is nearby, it can starting looking from this bookmark, forwards or backwards. Like a gap buffer, this gives a big advantage to clustered accesses, including iteration.

However, this string bookmark is global, one per Emacs instance, not once per string. In the last benchmark, the two multibyte strings are constantly fighting over a single string bookmark, and indexing in comparison function is reduced to O(n^2) time complexity.

So, Emacs pretends it has constant time access into its UTF-8 text data, but it’s only faking it with some simple optimizations. This usually works out just fine.

Julia

Another approach is to not pretend at all, and to make this limitation of UTF-8 explicit in the interface. Julia took this approach, and it was one of my complaints about the language. I don’t think this is necessarily a bad choice, but I do still think it’s inappropriate considering Julia’s target audience (i.e. Matlab users).

Julia strings are explicitly byte strings containing valid UTF-8 data. All indexing occurs on bytes, which is trivially constant time, and always decodes the multibyte code point starting at that byte. But it is an error to index to a byte that doesn’t begin a code point. That error is also trivially checked in constant time.

s = "π"

s[1]
# => 'π'

s[2]
# ERROR: UnicodeError: invalid character index
#  in getindex at ./strings/basic.jl:37

Slices are still over bytes, but they “round up” to the end of the current code point:

s[1:1]
# => "π"

Iterating over a string requires helper functions which keep an internal “bookmark” so that each access is constant time:

for i in eachindex(string)
    c = string[i]
    # ...
end

So Julia doesn’t pretend, it makes the problem explicit.

Go

Go is very similar to Julia, but takes an even more explicit view of strings. All strings are byte strings and there are no restrictions on their contents. Conventionally strings contain UTF-8 encoded text, but this is not strictly required. There’s a unicode/utf8 package for working with strings containing UTF-8 data.

Beyond convention, the range clause also assumes the string contains UTF-8 data, and it’s not an error if it does not. Bytes not containing valid UTF-8 data appear as a REPLACEMENT CHARACTER (U+FFFD).

func main() {
    s := "π\xff"
    for _, r := range s {
        fmt.Printf("U+%04x\n", r)
    }
}

// U+03c0
// U+fffd

A further case of the language favoring UTF-8 is that casting a string to []rune decodes strings into code points, like UCS-4, again using REPLACEMENT CHARACTER:

func main() {
    s := "π\xff"
    r := []rune(s)
    fmt.Printf("U+%04x\n", r[0])
    fmt.Printf("U+%04x\n", r[1])
}

// U+03c0
// U+fffd

So, like Julia, there’s no pretending, and the programmer explicitly must consider the problem.

Preferences

All-in-all I probably prefer how Julia and Go are explicit with UTF-8’s limitations, rather than Emacs Lisp’s attempt to cover it up with an internal optimization. Since the abstraction is leaky, it may as well be made explicit.

An Async / Await Library for Emacs Lisp

2019-03-10T20:57:03Z

As part of building my Python proficiency, I’ve learned how to use asyncio. This new language feature first appeared in Python 3.5 (PEP 492, September 2015). JavaScript grew a nearly identical feature in ES2017 (June 2017). An async function can pause to await on an asynchronously computed result, much like a generator pausing when it yields a value.

In fact, both Python and JavaScript async functions are essentially just fancy generator functions with some specialized syntax and semantics. That is, they’re stackless coroutines. Both languages already had generators, so their generator-like async functions are a natural extension that — unlike stackful coroutines — do not require significant, new runtime plumbing.

Emacs officially got generators in 25.1 (September 2016), though, unlike Python and JavaScript, it didn’t require any additional support from the compiler or runtime. It’s implemented entirely using Lisp macros. In other words, it’s just another library, not a core language feature. In theory, the generator library could be easily backported to the first Emacs release to properly support lexical closures, Emacs 24.1 (June 2012).

For the same reason, stackless async/await coroutines can also be implemented as a library. So that’s what I did, letting Emacs’ generator library do most of the heavy lifting. The package is called aio:

https://github.com/skeeto/emacs-aio

It’s modeled more closely on JavaScript’s async functions than Python’s asyncio, with the core representation being promises rather than a coroutine objects. I just have an easier time reasoning about promises than coroutines.

I’m definitely not the first person to realize this was possible, and was beaten to the punch by two years. Wanting to avoid fragmentation, I set aside all formality in my first iteration on the idea, not even bothering with namespacing my identifiers. It was to be only an educational exercise. However, I got quite attached to my little toy. Once I got my head wrapped around the problem, everything just sort of clicked into place so nicely.

In this article I will show step-by-step one way to build async/await on top of generators, laying out one concept at a time and then building upon each. But first, some examples to illustrate the desired final result.

aio example

Ignoring all its problems for a moment, suppose you want to use url-retrieve to fetch some content from a URL and return it. To keep this simple, I’m going to omit error handling. Also assume that lexical-binding is t for all examples. Besides, lexical scope required by the generator library, and therefore also required by aio.

The most naive approach is to fetch the content synchronously:

(defun fetch-fortune-1 (url)
  (let ((buffer (url-retrieve-synchronously url)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

The result is returned directly, and errors are communicated by an error signal (e.g. Emacs’ version of exceptions). This is convenient, but the function will block the main thread, locking up Emacs until the result has arrived. This is obviously very undesirable, so, in practice, everyone nearly always uses the asynchronous version:

(defun fetch-fortune-2 (url callback)
  (url-retrieve url (lambda (_status)
                      (funcall callback (buffer-string)))))

The main thread no longer blocks, but it’s a whole lot less convenient. The result isn’t returned to the caller, and instead the caller supplies a callback function. The result, whether success or failure, will be delivered via callback, so the caller must split itself into two pieces: the part before the callback and the callback itself. Errors cannot be delivered using a error signal because of the inverted flow control.

The situation gets worse if, say, you need to fetch results from two different URLs. You either fetch results one at a time (inefficient), or you manage two different callbacks that could be invoked in any order, and therefore have to coordinate.

Wouldn’t it be nice for the function to work like the first example, but be asynchronous like the second example? Enter async/await:

(aio-defun fetch-fortune-3 (url)
  (let ((buffer (aio-await (aio-url-retrieve url))))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

A function defined with aio-defun is just like defun except that it can use aio-await to pause and wait on any other function defined with aio-defun — or, more specifically, any function that returns a promise. Borrowing Python parlance: Returning a promise makes a function awaitable. If there’s an error, it’s delivered as a error signal from aio-url-retrieve, just like the first example. When called, this function returns immediately with a promise object that represents a future result. The caller might look like this:

(defcustom fortune-url ...)

(aio-defun display-fortune ()
  (interactive)
  (message "%s" (aio-await (fetch-fortune-3 fortune-url))))

How wonderfully clean that looks! And, yes, it even works with interactive like that. I can M-x display-fortune and a fortune is printed in the minibuffer as soon as the result arrives from the server. In the meantime Emacs doesn’t block and I can continue my work.

You can’t do anything you couldn’t already do before. It’s just a nicer way to organize the same callbacks: implicit rather than explicit.

Promises, simplified

The core object at play is the promise. Promises are already a rather simple concept, but aio promises have been distilled to their essence, as they’re only needed for this singular purpose. More on this later.

As I said, a promise represents a future result. In practical terms, a promise is just an object to which one can subscribe with a callback. When the result is ready, the callbacks are invoked. Another way to put it is that promises reify the concept of callbacks. A callback is no longer just the idea of extra argument on a function. It’s a first-class thing that itself can be passed around as a value.

Promises have two slots: the final promise result and a list of subscribers. A nil result means the result hasn’t been computed yet. It’s so simple I’m not even bothering with cl-struct.

(defun aio-promise ()
  "Create a new promise object."
  (record 'aio-promise nil ()))

(defsubst aio-promise-p (object)
  (and (eq 'aio-promise (type-of object))
       (= 3 (length object))))

(defsubst aio-result (promise)
  (aref promise 1))

To subscribe to a promise, use aio-listen:

(defun aio-listen (promise callback)
  (let ((result (aio-result promise)))
    (if result
        (run-at-time 0 nil callback result)
      (push callback (aref promise 2)))))

If the result isn’t ready yet, add the callback to the list of subscribers. If the result is ready call the callback in the next event loop turn using run-at-time. This is important because it keeps all the asynchronous components isolated from one another. They won’t see each others’ frames on the call stack, nor frames from aio. This is so important that the Promises/A+ specification is explicit about it.

The other half of the equation is resolving a promise, which is done with aio-resolve. Unlike other promises, aio promises don’t care whether the promise is being fulfilled (success) or rejected (error). Instead a promise is resolved using a value function — or, usually, a value closure. Subscribers receive this value function and extract the value by invoking it with no arguments.

Why? This lets the promise’s resolver decide the semantics of the result. Instead of returning a value, this function can instead signal an error, propagating an error signal that terminated an async function. Because of this, the promise doesn’t need to know how it’s being resolved.

When a promise is resolved, subscribers are each scheduled in their own event loop turns in the same order that they subscribed. If a promise has already been resolved, nothing happens. (Thought: Perhaps this should be an error in order to catch API misuse?)

(defun aio-resolve (promise value-function)
  (unless (aio-result promise)
    (let ((callbacks (nreverse (aref promise 2))))
      (setf (aref promise 1) value-function
            (aref promise 2) ())
      (dolist (callback callbacks)
        (run-at-time 0 nil callback value-function)))))

If you’re not an async function, you might subscribe to a promise like so:

(aio-listen promise (lambda (v)
                      (message "%s" (funcall v))))

The simplest example of a non-async function that creates and delivers on a promise is a “sleep” function:

(defun aio-sleep (seconds &optional result)
  (let ((promise (aio-promise))
        (value-function (lambda () result)))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

Similarly, here’s a “timeout” promise that delivers a special timeout error signal at a given time in the future.

(defun aio-timeout (seconds)
  (let ((promise (aio-promise))
        (value-function (lambda () (signal 'aio-timeout nil))))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

That’s all there is to promises.

Evaluate in the context of a promise

Before we get into pausing functions, lets deal with the slightly simpler matter of delivering their return values using a promise. What we need is a way to evaluate a “body” and capture its result in a promise. If the body exits due to a signal, we want to capture that as well.

Here’s a macro that does just this:

(defmacro aio-with-promise (promise &rest body)
  `(aio-resolve ,promise
                (condition-case error
                    (let ((result (progn ,@body)))
                      (lambda () result))
                  (error (lambda ()
                           (signal (car error) ; rethrow
                                   (cdr error)))))))

The body result is captured in a closure and delivered to the promise. If there’s an error signal, it’s “rethrown” into subscribers by the promise’s value function.

This is where Emacs Lisp has a serious weak spot. There’s not really a concept of rethrowing a signal. Unlike a language with explicit exception objects that can capture a snapshot of the backtrace, the original backtrace is completely lost where the signal is caught. There’s no way to “reattach” it to the signal when it’s rethrown. This is unfortunate because it would greatly help debugging if you got to see the full backtrace on the other side of the promise.

Async functions

So we have promises and we want to pause a function on a promise. Generators have iter-yield for pausing an iterator’s execution. To tackle this problem:

Yield the promise to pause the iterator.
Subscribe a callback on the promise that continues the generator (iter-next) with the promise’s result as the yield result.

All the hard work is done in either side of the yield, so aio-await is just a simple wrapper around iter-yield:

(defmacro aio-await (expr)
  `(funcall (iter-yield ,expr)))

Remember, that funcall is here to extract the promise value from the value function. If it signals an error, this propagates directly into the iterator just as if it had been a direct call — minus an accurate backtrace.

So aio-lambda / aio-defun needs to wrap the body in a generator (iter-lamba), invoke it to produce a generator, then drive the generator using callbacks. Here’s a simplified, unhygienic definition of aio-lambda:

(defmacro aio-lambda (arglist &rest body)
  `(lambda (&rest args)
     (let ((promise (aio-promise))
           (iter (apply (iter-lambda ,arglist
                          (aio-with-promise promise
                            ,@body))
                        args)))
       (prog1 promise
         (aio--step iter promise nil)))))

The body is evaluated inside aio-with-promise with the result delivered to the promise returned directly by the async function.

Before returning, the iterator is handed to aio--step, which drives the iterator forward until it delivers its first promise. When the iterator yields a promise, aio--step attaches a callback back to itself on the promise as described above. Immediately driving the iterator up to the first yielded promise “primes” it, which is important for getting the ball rolling on any asynchronous operations.

If the iterator ever yields something other than a promise, it’s delivered right back into the iterator.

(defun aio--step (iter promise yield-result)
  (condition-case _
      (cl-loop for result = (iter-next iter yield-result)
               then (iter-next iter (lambda () result))
               until (aio-promise-p result)
               finally (aio-listen result
                                   (lambda (value)
                                     (aio--step iter promise value))))
    (iter-end-of-sequence)))

When the iterator is done, nothing more needs to happen since the iterator resolves its own return value promise.

The definition of aio-defun just uses aio-lambda with defalias. There’s nothing to it.

That’s everything you need! Everything else in the package is merely useful, awaitable functions like aio-sleep and aio-timeout.

Composing promises

Unfortunately url-retrieve doesn’t support timeouts. We can work around this by composing two promises: a url-retrieve promise and aio-timeout promise. First define a promise-returning function, aio-select that takes a list of promises and returns (as another promise) the first promise to resolve:

(defun aio-select (promises)
  (let ((result (aio-promise)))
    (prog1 result
      (dolist (promise promises)
        (aio-listen promise (lambda (_)
                              (aio-resolve
                               result
                               (lambda () promise))))))))

We give aio-select both our url-retrieve and timeout promises, and it tells us which resolved first:

(aio-defun fetch-fortune-4 (url timeout)
  (let* ((promises (list (aio-url-retrieve url)
                         (aio-timeout timeout)))
         (fastest (aio-await (aio-select promises)))
         (buffer (aio-await fastest)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

Cool! Note: This will not actually cancel the URL request, just move the async function forward earlier and prevent it from getting the result.

Threads

Despite aio being entirely about managing concurrent, asynchronous operations, it has nothing at all to do with threads — as in Emacs 26’s support for kernel threads. All async functions and promise callbacks are expected to run only on the main thread. That’s not to say an async function can’t await on a result from another thread. It just must be done very carefully.

Processes

The package also includes two functions for realizing promises on processes, whether they be subprocesses or network sockets.

aio-process-filter
aio-process-sentinel

For example, this function loops over each chunk of output (typically 4kB) from the process, as delivered to a filter function:

(aio-defun process-chunks (process)
  (cl-loop for chunk = (aio-await (aio-process-filter process))
           while chunk
           do (... process chunk ...)))

Exercise for the reader: Write an awaitable function that returns a line at at time rather than a chunk at a time. You can build it on top of aio-process-filter.

I considered wrapping functions like start-process so that their aio versions would return a promise representing some kind of result from the process. However there are so many different ways to create and configure processes that I would have ended up duplicating all the process functions. Focusing on the filter and sentinel, and letting the caller create and configure the process is much cleaner.

Unfortunately Emacs has no asynchronous API for writing output to a process. Both process-send-string and process-send-region will block if the pipe or socket is full. There is no callback, so you cannot await on writing output. Maybe there’s a way to do it with a dedicated thread?

Another issue is that the process-send-* functions are preemptible, made necessary because they block. The aio-process-* functions leave a gap (i.e. between filter awaits) where no filter or sentinel function is attached. It’s a consequence of promises being single-fire. The gap is harmless so long as the async function doesn’t await something else or get preempted. This needs some more thought.

Update: These process functions no longer exist and have been replaced by a small framework for building chains of promises. See aio-make-callback.

Testing aio

The test suite for aio is a bit unusual. Emacs’ built-in test suite, ERT, doesn’t support asynchronous tests. Furthermore, tests are generally run in batch mode, where Emacs invokes a single function and then exits rather than pump an event loop. Batch mode can only handle asynchronous process I/O, not the async functions of aio. So it’s not possible to run the tests in batch mode.

Instead I hacked together a really crude callback-based test suite. It runs in non-batch mode and writes the test results into a buffer (run with make check). Not ideal, but it works.

One of the tests is a sleep sort (with reasonable tolerances). It’s a pretty neat demonstration of what you can do with aio:

(aio-defun sleep-sort (values)
  (let ((promises (mapcar (lambda (v) (aio-sleep v v)) values)))
    (cl-loop while promises
             for next = (aio-await (aio-select promises))
             do (setf promises (delq next promises))
             collect (aio-await next))))

To see it in action (M-x sleep-sort-demo):

(aio-defun sleep-sort-demo ()
  (interactive)
  (let ((values '(0.1 0.4 1.1 0.2 0.8 0.6)))
    (message "%S" (aio-await (sleep-sort values)))))

Async/await is pretty awesome

I’m quite happy with how this all came together. Once I had the concepts straight — particularly resolving to value functions — everything made sense and all the parts fit together well, and mostly by accident. That feels good.

Emacs 26 Brings Generators and Threads

2018-05-31T17:45:16Z

Emacs 26.1 was recently released. As you would expect from a major release, it comes with lots of new goodies. Being a bit of an Emacs Lisp enthusiast, the two most interesting new features are generators (iter) and native threads (thread).

Correction: Generators were actually introduced in Emacs 25.1 (Sept. 2016), not Emacs 26.1. Doh!

Update: ThreadSanitizer (TSan) quickly shows that Emacs’ threading implementation has many data races, making it completely untrustworthy. Until this is fixed, nobody should use Emacs threads for any purpose, and threads should disabled at compile time.

Generators

Generators are one of those cool language features that provide a lot of power at a small implementation cost. They’re like a constrained form of coroutines, but, unlike coroutines, they’re typically built entirely on top of first-class functions (e.g. closures). This means no additional run-time support is needed in order to add generators to a language. The only complications are the changes to the compiler. Generators are not compiled the same way as normal functions despite looking so similar.

What’s perhaps coolest of all about lisp-family generators, including Emacs Lisp, is that the compiler component can be implemented entirely with macros. The compiler need not be modified at all, making generators no more than a library, and not actually part of the language. That’s exactly how they’ve been implemented in Emacs Lisp (emacs-lisp/generator.el).

So what’s a generator? It’s a function that returns an iterator object. When an iterator object is invoked (e.g. iter-next) it evaluates the body of the generator. Each iterator is independent. What makes them unusual (and useful) is that the evaluation is paused in the middle of the body to return a value, saving all the internal state in the iterator. Normally pausing in the middle of functions isn’t possible, which is what requires the special compiler support.

Emacs Lisp generators appear to be most closely modeled after Python generators, though it also shares some similarities to JavaScript generators. What makes it most like Python is the use of signals for flow control — something I’m not personally enthused about. When a Python generator completes, it throws a StopItertion exception. In Emacs Lisp, it’s an iter-end-of-sequence signal. A signal is out-of-band and avoids the issue relying on some special in-band value to communicate the end of iteration.

In contrast, JavaScript’s solution is to return a “rich” object wrapping the actual yield value. This object has a done field that communicates whether iteration has completed. This avoids the use of exceptions for flow control, but the caller has to unpack the rich object.

Fortunately the flow control issue isn’t normally exposed to Emacs Lisp code. Most of the time you’ll use the iter-do macro or (my preference) the new cl-loop keyword iter-by.

To illustrate how a generator works, here’s a really simple iterator that iterates over a list:

(iter-defun walk (list)
  (while list
    (iter-yield (pop list))))

Here’s how it might be used:

(setf i (walk '(:a :b :c)))

(iter-next i)  ; => :a
(iter-next i)  ; => :b
(iter-next i)  ; => :c
(iter-next i)  ; error: iter-end-of-sequence

The iterator object itself is opaque and you shouldn’t rely on any part of its structure. That being said, I’m a firm believer that we should understand how things work underneath the hood so that we can make the most effective use of at them. No program should rely on the particulars of the iterator object internals for correctness, but a well-written program should employ them in a way that best exploits their expected implementation.

Currently iterator objects are closures, and iter-next invokes the closure with its own internal protocol. It asks the closure to return the next value (:next operation), and iter-close asks it to clean itself up (:close operation).

Since they’re just closures, another really cool thing about Emacs Lisp generators is that iterator objects are generally readable. That is, you can serialize them out with print and bring them back to life with read, even in another instance of Emacs. They exist independently of the original generator function. This will not work if one of the values captured in the iterator object is not readable (e.g. buffers).

How does pausing work? Well, one of other exciting new features of Emacs 26 is the introduction of a jump table opcode, switch. I’d lamented in the past that large cond and cl-case expressions could be a lot more efficient if Emacs’ byte code supported jump tables. It turns an O(n) sequence of comparisons into an O(1) lookup and jump. It’s essentially the perfect foundation for a generator since it can be used to jump straight back to the position where evaluation was paused.

Buuut, generators do not currently use jump tables. The generator library predates the new switch opcode, and, being independent of it, its author, Daniel Colascione, went with the best option at the time. Chunks of code between yields are packaged as individual closures. These closures are linked together a bit like nodes in a graph, creating a sort of state machine. To get the next value, the iterator object invokes the closure representing the next state.

I’ve manually macro expanded the walk generator above into a form that roughly resembles the expansion of iter-defun:

(defun walk (list)
  (let (state)
    (cl-flet* ((state-2 ()
                 (signal 'iter-end-of-sequence nil))
               (state-1 ()
                 (prog1 (pop list)
                   (when (null list)
                     (setf state #'state-2))))
               (state-0 ()
                 (if (null list)
                     (state-2)
                   (setf state #'state-1)
                   (state-1))))
      (setf state #'state-0)
      (lambda ()
        (funcall state)))))

This omits the protocol I mentioned, and it doesn’t have yield results (values passed to the iterator). The actual expansion is a whole lot messier and less optimal than this, but hopefully my hand-rolled generator is illustrative enough. Without the protocol, this iterator is stepped using funcall rather than iter-next.

The state variable keeps track of where in the body of the generator this iterator is currently “paused.” Continuing the iterator is therefore just a matter of invoking the closure that represents this state. Each state closure may update state to point to a new part of the generator body. The terminal state is obviously state-2. Notice how state transitions occur around branches.

I had said generators can be implemented as a library in Emacs Lisp. Unfortunately theres a hole in this: unwind-protect. It’s not valid to yield inside an unwind-protect form. Unlike, say, a throw-catch, there’s no mechanism to trap an unwinding stack so that it can be restarted later. The state closure needs to return and fall through the unwind-protect.

A jump table version of the generator might look like the following. I’ve used cl-labels since it allows for recursion.

(defun walk (list)
  (let ((state 0))
    (cl-labels
        ((closure ()
           (cl-case state
             (0 (if (null list)
                    (setf state 2)
                  (setf state 1))
                (closure))
             (1 (prog1 (pop list)
                  (when (null list)
                    (setf state 2))))
             (2 (signal 'iter-end-of-sequence nil)))))
      #'closure)))

When byte compiled on Emacs 26, that cl-case is turned into a jump table. This “switch” form is closer to how generators are implemented in other languages.

Iterator objects can share state between themselves if they close over a common environment (or, of course, use the same global variables).

(setf foo
      (let ((list '(:a :b :c)))
        (list
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list)))))
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list))))))))

(iter-next (nth 0 foo))  ; => :a
(iter-next (nth 1 foo))  ; => :b
(iter-next (nth 0 foo))  ; => :c

For years there has been a very crude way to “pause” a function and allow other functions to run: accept-process-output. It only works in the context of processes, but five years ago this was sufficient for me to build primitives on top of it. Unlike this old process function, generators do not block threads, including the user interface, which is really important.

Threads

Emacs 26 also bring us threads, which have been attached in a very bolted on fashion. It’s not much more than a subset of pthreads: shared memory threads, recursive mutexes, and condition variables. The interfaces look just like they do in pthreads, and there hasn’t been much done to integrate more naturally into the Emacs Lisp ecosystem.

This is also only the first step in bringing threading to Emacs Lisp. Right now there’s effectively a global interpreter lock (GIL), and threads only run one at a time cooperatively. Like with generators, the Python influence is obvious. In theory, sometime in the future this interpreter lock will be removed, making way for actual concurrency.

This is, again, where I think it’s useful to contrast with JavaScript, which was also initially designed to be single-threaded. Low-level threading primitives weren’t exposed — though mostly because JavaScript typically runs sandboxed and there’s no safe way to expose those primitives. Instead it got a web worker API that exposes concurrency at a much higher level, along with an efficient interface for thread coordination.

For Emacs Lisp, I’d prefer something safer, more like the JavaScript approach. Low-level pthreads are now a great way to wreck Emacs with deadlocks (with no C-g escape). Playing around with the new threading API for just a few days, I’ve already had to restart Emacs a bunch of times. Bugs in Emacs Lisp are normally a lot more forgiving.

One important detail that has been designed well is that dynamic bindings are thread-local. This is really essential for correct behavior. This is also an easy way to create thread-local storage (TLS): dynamically bind variables in the thread’s entrance function.

;;; -*- lexical-binding: t; -*-

(defvar foo-counter-tls)
(defvar foo-path-tls)

(defun foo-make-thread (path)
  (make-thread
   (lambda ()
     (let ((foo-counter-tls 0)
           (foo-name-tls path))
       ...))))

However, cl-letf “bindings” are not thread-local, which makes this otherwise incredibly useful macro quite dangerous in the presence of threads. This is one way that the new threading API feels bolted on.

Building generators on threads

In my stack clashing article I showed a few different ways to add coroutine support to C. One method spawned per-coroutine threads, and coordinated using semaphores. With the new threads API in Emacs, it’s possible to do exactly the same thing.

Since generators are just a limited form of coroutines, this means threads offer another, very different way to implement them. The threads API doesn’t provide semaphores, but condition variables can fill in for them. To “pause” in the middle of the generator, just wait on a condition variable.

So, naturally, I just had to see if I could make it work. I call it a “thread iterator” or “thriter.” The API is very similar to iter:

https://github.com/skeeto/thriter

This is merely a proof of concept so don’t actually use this library for anything. These thread-based generators are about 5x slower than iter generators, and they’re a lot more heavy-weight, needing an entire thread per iterator object. This makes thriter-close all the more important. On the other hand, these generators have no problem yielding inside unwind-protect.

Originally this article was going to dive into the details of how these thread-iterators worked, but thriter turned out to be quite a bit more complicated than I anticipated, especially as I worked towards feature matching iter.

The gist of it is that each side of a next/yield transaction gets its own condition variable, but share a common mutex. Values are passed between the threads using slots on the iterator object. The side that isn’t currently running waits on a condition variable until the other side frees it, after which the releaser waits on its own condition variable for the result. This is similar to asynchronous requests in Emacs dynamic modules.

Rather than use signals to indicate completion, I modeled it after JavaScript generators. Iterators return a cons cell. The car indicates continuation and the cdr holds the yield result. To terminate an iterator early (thriter-close or garbage collection), thread-signal is used to essentially “cancel” the thread and knock it off the condition variable.

Since threads aren’t (and shouldn’t be) garbage collected, failing to run a thread-iterator to completion would normally cause a memory leak, as the thread sits there forever waiting on a “next” that will never come. To deal with this, there’s a finalizer is attached to the iterator object in such a way that it’s not visible to the thread. A lost iterator is eventually cleaned up by the garbage collector, but, as usual with finalizers, this is only a last resort.

The future of threads

This thread-iterator project was my initial, little experiment with Emacs Lisp threads, similar to why I connected a joystick to Emacs using a dynamic module. While I don’t expect the current thread API to go away, it’s not really suitable for general use in its raw form. Bugs in Emacs Lisp programs should virtually never bring down Emacs and require a restart. Outside of threads, the few situations that break this rule are very easy to avoid (and very obvious that something dangerous is happening). Dynamic modules are dangerous by necessity, but concurrency doesn’t have to be.

There really needs to be a safe, high-level API with clean thread isolation. Perhaps this higher-level API will eventually build on top of the low-level threading API.

Emacs Lisp Lambda Expressions Are Not Self-Evaluating

2018-02-22T21:30:57Z

This week I made a mistake that ultimately enlightened me about the nature of function objects in Emacs Lisp. There are three kinds of function objects, but they each behave very differently when evaluated as objects.

But before we get to that, let’s talk about one of Emacs’ embarrassing, old missteps: eval-after-load.

Taming an old dragon

One of the long-standing issues with Emacs is that loading Emacs Lisp files (.el and .elc) is a slow process, even when those files have been byte compiled. There are a number of dirty hacks in place to deal with this issue, and the biggest and nastiest of them all is the dumper, also known as unexec.

The Emacs you routinely use throughout the day is actually a previous instance of Emacs that’s been resurrected from the dead. Your undead Emacs was probably created months, if not years, earlier, back when it was originally compiled. The first stage of compiling Emacs is to compile a minimal C core called temacs. The second stage is loading a bunch of Emacs Lisp files, then dumping a memory image in an unportable, platform-dependent way. On Linux, this actually requires special hooks in glibc. The Emacs you know and love is this dumped image loaded back into memory, continuing from where it left off just after it was compiled. Regardless of your own feelings on the matter, you have to admit this is a very lispy thing to do.

There are two notable costs to Emacs’ dumper:

The dumped image contains hard-coded memory addresses. This means Emacs can’t be a Position Independent Executable (PIE). It can’t take advantage of a security feature called Address Space Layout Randomization (ASLR), which would increase the difficulty of exploiting some classes of bugs. This might be important to you if Emacs processes untrusted data, such as when it’s used as a mail client, a web server or generally parses data downloaded across the network.
It’s not possible to cross-compile Emacs since it can only be dumped by running temacs on its target platform. As an experiment I’ve attempted to dump the Windows version of Emacs on Linux using Wine, but was unsuccessful.

The good news is that there’s a portable dumper in the works that makes this a lot less nasty. If you’re adventurous, you can already disable dumping and run temacs directly by setting CANNOT_DUMP=yes at compile time. Be warned, though, that a non-dumped Emacs takes several seconds, or worse, to initialize before it even begins loading your own configuration. It’s also somewhat buggy since it seems nobody ever runs it this way productively.

The other major way Emacs users have worked around slow loading is aggressive use of lazy loading, generally via autoloads. The major package interactive entry points are defined ahead of time as stub functions. These stubs, when invoked, load the full package, which overrides the stub definition, then finally the stub re-invokes the new definition with the same arguments.

To further assist with lazy loading, an evaluated defvar form will not override an existing global variable binding. This means you can, to a certain extent, configure a package before it’s loaded. The package will not clobber any existing configuration when it loads. This also explains the bizarre interfaces for the various hook functions, like add-hook and run-hooks. These accept symbols — the names of the variables — rather than values of those variables as would normally be the case. The add-to-list function does the same thing. It’s all intended to cooperate with lazy loading, where the variable may not have been defined yet.

eval-after-load

Sometimes this isn’t enough and you need some some configuration to take place after the package has been loaded, but without forcing it to load early. That is, you need to tell Emacs “evaluate this code after this particular package loads.” That’s where eval-after-load comes into play, except for its fatal flaw: it takes the word “eval” completely literally.

The first argument to eval-after-load is the name of a package. Fair enough. The second argument is a form that will be passed to eval after that package is loaded. Now hold on a minute. The general rule of thumb is that if you’re calling eval, you’re probably doing something seriously wrong, and this function is no exception. This is completely the wrong mechanism for the task.

The second argument should have been a function — either a (sharp quoted) symbol or a function object. And then instead of eval it would be something more sensible, like funcall. Perhaps this improved version would be named call-after-load or run-after-load.

The big problem with passing an s-expression is that it will be left uncompiled due to being quoted. I’ve talked before about the importance of evaluating your lambdas. eval-after-load not only encourages badly written Emacs Lisp, it demands it.

;;; BAD!
(eval-after-load 'simple-httpd
                 '(push '("c" . "text/plain") httpd-mime-types))

This was all corrected in Emacs 25. If the second argument to eval-after-load is a function — the result of applying functionp is non-nil — then it uses funcall. There’s also a new macro, with-eval-after-load, to package it all up nicely.

;;; Better (Emacs >= 25 only)
(eval-after-load 'simple-httpd
  (lambda ()
    (push '("c" . "text/plain") httpd-mime-types)))

;;; Best (Emacs >= 25 only)
(with-eval-after-load 'simple-httpd
  (push '("c" . "text/plain") httpd-mime-types))

Though in both of these examples the compiler will likely warn about httpd-mime-types not being defined. That’s a problem for another day.

A workaround

But what if you need to use Emacs 24, as was the situation that sparked this article? What can we do with the bad version of eval-after-load? We could situate a lambda such that it’s evaluated, but then smuggle the resulting function object into the form passed to eval-after-load, all using a backquote.

;;; Note: this is subtly broken
(eval-after-load 'simple-httpd
  `(funcall
    ,(lambda ()
       (push '("c" . "text/plain") httpd-mime-types)))

When everything is compiled, the backquoted form evalutes to this:

(funcall #[0  [httpd-mime-types ("c" . "text/plain")] 2])

Where the second value (#[...]) is a byte-code object. However, as the comment notes, this is subtly broken. A cleaner and correct way to solve all this is with a named function. The damage caused by eval-after-load will have been (mostly) minimized.

(defun my-simple-httpd-hook ()
  (push '("c" . "text/plain") httpd-mime-types))

(eval-after-load 'simple-httpd
  '(funcall #'my-simple-httpd-hook))

But, let’s go back to the anonymous function solution. What was broken about it? It all has to do with evaluating function objects.

Evaluating function objects

So what happens when we evaluate an expression like the one above with eval? Here’s what it looks like again.

(funcall #[...])

First, eval notices it’s been given a non-empty list, so it’s probably a function call. The first argument is the name of the function to be called (funcall) and the remaining elements are its arguments. But each of these elements must be evaluated first, and the result of that evaluation becomes the arguments.

Any value that isn’t a list or a symbol is self-evaluating. That is, it evaluates to its own value:

(eval 10)
;; => 10

If the value is a symbol, it’s treated as a variable. If the value is a list, it goes through the function call process I’m describing (or one of a number of other special cases, such as macro expansion, lambda expressions, and special forms).

So, conceptually eval recurses on the function object #[...]. A function object is not a list or a symbol, so it’s self-evaluating. No problem.

;; Byte-code objects are self-evaluating

(let ((x (byte-compile (lambda ()))))
  (eq x (eval x)))
;; => t

What if this code wasn’t compiled? Rather than a byte-code object, we’d have some other kind of function object for the interpreter. Let’s examine the dynamic scope (shudder) case. Here, a lambda appears to evaluate to itself, but appearances can be deceiving:

(eval (lambda ())
;; => (lambda ())

However, this is not self-evaluation. Lambda expressions are not self-evaluating. It’s merely coincidence that the result of evaluating a lambda expression looks like the original expression. This is just how the Emacs Lisp interpreter is currently implemented and, strictly speaking, it’s an implementation detail that just so happens to be mostly compatible with byte-code objects being self-evaluating. It would be a mistake to rely on this.

Instead, dynamic scope lambda expression evaluation is idempotent. Applying eval to the result will return an equal, but not identical (eq), expression. In contrast, a self-evaluating value is also idempotent under evaluation, but with eq results.

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x)))
;; => nil

;; Evaluation is idempotent:

(let ((x '(lambda ())))
  (equal x (eval x)))
;; => t

(let ((x '(lambda ())))
  (equal x (eval (eval x))))
;; => t

So, with dynamic scope, the subtly broken backquote example will still work, but only by sheer luck. Under lexical scope, the situation isn’t so lucky:

;;; -*- lexical-scope: t; -*-

(lambda ())
;; => (closure (t) nil)

These interpreted lambda functions are neither self-evaluating nor idempotent. Passing t as the second argument to eval tells it to use lexical scope, as shown below:

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x t)))
;; => nil

;; Not idempotent:

(let ((x '(lambda ())))
  (equal x (eval x t)))
;; => nil

(let ((x '(lambda ())))
  (equal x (eval (eval x t) t)))
;; error: (void-function closure)

I can imagine an implementation of Emacs Lisp where dynamic scope lambda expressions are in the same boat, where they’re not even idempotent. For example:

;;; -*- lexical-binding: nil; -*-

(lambda ())
;; => (totally-not-a-closure ())

Most Emacs Lisp would work just fine under this change, and only code that makes some kind of logical mistake — where there’s nested evaluation of lambda expressions — would break. This essentially already happened when lots of code was quietly switched over to lexical scope after Emacs 24. Lambda idempotency was lost and well-written code didn’t notice.

There’s a temptation here for Emacs to define a closure function or special form that would allow interpreter closure objects to be either self-evaluating or idempotent. This would be a mistake. It would only serve as a hack that covers up logical mistakes that lead to nested evaluation. Much better to catch those problems early.

Solving the problem with one character

So how do we fix the subtly broken example? With a strategically placed quote right before the comma.

(eval-after-load 'simple-httpd
  `(funcall
    ',(lambda ()
        (push '("c" . "text/plain") httpd-mime-types)))

So the form passed to eval-after-load becomes:

;; Compiled:
(funcall (quote #[...]))

;; Dynamic scope:
(funcall (quote (lambda () ...)))

;; Lexical scope:
(funcall (quote (closure (t) () ...)))

The quote prevents eval from evaluating the function object, which would be either needless or harmful. There’s also an argument to be made that this is a perfect situation for a sharp-quote (#'), which exists to quote functions.

Options for Structured Data in Emacs Lisp

2018-02-14T17:43:34Z

So your Emacs package has grown beyond a dozen or so lines of code, and the data it manages is now structured and heterogeneous. Informal plain old lists, the bread and butter of any lisp, are not longer cutting it. You really need to cleanly abstract this structure, both for your own organizational sake any for anyone reading your code.

With informal lists as structures, you might regularly ask questions like, “Was the ‘name’ slot stored in the third list element, or was it the fourth element?” A plist or alist helps with this problem, but those are better suited for informal, externally-supplied data, not for internal structures with fixed slots. Occasionally someone suggests using hash tables as structures, but Emacs Lisp’s hash tables are much too heavy for this. Hash tables are more appropriate when keys themselves are data.

Defining a data structure from scratch

Imagine a refrigerator package that manages a collection of food in a refrigerator. A food item could be structured as a plain old list, with slots at specific positions.

(defun fridge-item-create (name expiry weight)
  (list name expiry weight))

A function that computes the mean weight of a list of food items might look like this:

(defun fridge-mean-weight (items)
  (if (null items)
      0.0
    (let ((sum 0.0)
          (count 0))
      (dolist (item items (/ sum count))
        (setf count (1+ count)
              sum (+ sum (nth 2 item)))))))

Note the use of (nth 2 item) at the end, used to get the item’s weight. That magic number 2 is easy to mess up. Even worse, if lots of code accesses “weight” this way, then future extensions will be inhibited. Defining some accessor functions solves this problem.

(defsubst fridge-item-name (item)
  (nth 0 item))

(defsubst fridge-item-expiry (item)
  (nth 1 item))

(defsubst fridge-item-weight (item)
  (nth 2 item))

The defsubst defines an inline function, so there’s effectively no additional run-time costs for these accessors compared to a bare nth. Since these only cover getting slots, we should also define some setters using the built-in gv (generalized variable) package.

(require 'gv)

(gv-define-setter fridge-item-name (value item)
  `(setf (nth 0 ,item) ,value))

(gv-define-setter fridge-item-expiry (value item)
  `(setf (nth 1 ,item) ,value))

(gv-define-setter fridge-item-weight (value item)
  `(setf (nth 2 ,item) ,value))

This makes each slot setf-able. Generalized variables are great for simplifying APIs, since otherwise there would need to be an equal number of setter functions (fridge-item-set-name, etc.). With generalized variables, both are at the same entrypoint:

(setf (fridge-item-name item) "Eggs")

There are still two more significant improvements.

As far as Emacs Lisp is concerned, this isn’t a real type. The type-ness of it is just a fiction created by the conventions of the package. It would be easy to make the mistake of passing an arbitrary list to these fridge-item functions, and the mistake wouldn’t be caught so long as that list has at least three items. An common solution is to add a type tag: a symbol at the beginning of the structure that identifies it.
It’s still a linked list, and nth has to walk the list (i.e. O(n)) to retrieve items. It would be much more efficient to use a vector, turning this into an efficient O(1) operation.

Addressing both of these at once:

(defun fridge-item-create (name expiry weight)
  (vector 'fridge-item name expiry weight))

(defsubst fridge-item-p (object)
  (and (vectorp object)
       (= (length object) 4)
       (eq 'fridge-item (aref object 0))))

(defsubst fridge-item-name (item)
  (unless (fridge-item-p item)
    (signal 'wrong-type-argument (list 'fridge-item item)))
  (aref item 1))

(defsubst fridge-item-name--set (item value)
  (unless (fridge-item-p item)
    (signal 'wrong-type-argument (list 'fridge-item item)))
  (setf (aref item 1) value))

(gv-define-setter fridge-item-name (value item)
  `(fridge-item-name--set ,item ,value))

;; And so on for expiry and weight...

As long as fridge-mean-weight uses the fridge-item-weight accessor, it continues to work unmodified across all these changes. But, whew, that’s quite a lot of boilerplate to write and maintain for each data structure in our package! Boilerplate code generation is a perfect candidate for a macro definition. Luckily for us, Emacs already defines a macro to generate all this code: cl-defstruct.

(require 'cl-lib)

(cl-defstruct fridge-item
  name expiry weight)

In Emacs 25 and earlier, this innocent looking definition expands into essentially all the above code. The code it generates is expressed in the most optimal form for its version of Emacs, and it exploits many of the available optimizations by using function declarations such as side-effect-free and error-free. It’s configurable, too, allowing for the exclusion of a type tag (:named) — discarding all the type checks — or using a list rather than a vector as the underlying structure (:type). As a crude form of structural inheritance, it even allows for directly embedding other structures (:include).

Two pitfalls

There a couple pitfalls, though. First, for historical reasons, the macro will define two namespace-unfriendly functions: make-NAME and copy-NAME. I always override these, preferring the -create convention for the constructor, and tossing the copier since it’s either useless or, worse, semantically wrong.

(cl-defstruct (fridge-item (:constructor fridge-item-create)
                           (:copier nil))
  name expiry weight)

If the constructor needs to be more sophisticated than just setting slots, it’s common to define a “private” constructor (double dash in the name) and wrap it with a “public” constructor that has some behavior.

(cl-defstruct (fridge-item (:constructor fridge-item--create)
                           (:copier nil))
  name expiry weight entry-time)

(cl-defun fridge-item-create (&rest args)
  (apply #'fridge-item--create :entry-time (float-time) args))

The other pitfall is related to printing. In Emacs 25 and earlier, types defined by cl-defstruct are still only types by convention. They’re really just vectors as far as Emacs Lisp is concerned. One benefit from this is that printing and reading these structures is “free” because vectors are printable. It’s trivial to serialize cl-defstruct structures out to a file. This is exactly how the Elfeed database works.

The pitfall is that once a structure has been serialized, there’s no more changing the cl-defstruct definition. It’s now a file format definition, so the slots are locked in place. Forever.

Emacs 26 throws a wrench in all this, though it’s worth it in the long run. There’s a new primitive type in Emacs 26 with its own reader syntax: records. This is similar to hash tables becoming first class in the reader in Emacs 23.2. In Emacs 26, cl-defstruct uses records instead of vectors.

;; Emacs 25:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => [cl-struct-fridge-item "Eggs" nil 11.1]

;; Emacs 26:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => #s(fridge-item "Eggs" nil 11.1)

So far slots are still accessed using aref, and all the type checking still happens in Emacs Lisp. The only practical change is the record function is used in place of the vector function when allocating a structure. But it does pave the way for more interesting things in the future.

The major short-term downside is that this breaks printed compatibility across the Emacs 25/26 boundary. The cl-old-struct-compat-mode function can be used for some degree of backwards, but not forwards, compatibility. Emacs 26 can read and use some structures printed by Emacs 25 and earlier, but the reverse will never be true. This issue initially tripped up Emacs’ built-in packages, and when Emacs 26 is released we’ll see more of these issues arise in external packages.

Dynamic dispatch

Prior to Emacs 25, the major built-in package for dynamic dispatch — functions that specialize on the run-time type of their arguments — was EIEIO, though it only supported single dispatch (specializing on a single argument). EIEIO brought much of the Common Lisp Object System (CLOS) to Emacs Lisp, including classes and methods.

Emacs 25 introduced a more sophisticated dynamic dispatch package called cl-generic. It focuses only on dynamic dispatch and supports multiple dispatch, completely replacing the dynamic dispatch portion of EIEIO. Since cl-defstruct does inheritance and cl-generic does dynamic dispatch, there’s not really much left for EIEIO — besides bad ideas like multiple inheritance and method combination.

Without either of these packages, the most direct way to build single dispatch on top of cl-defstruct would be to shove a function in one of the slots. Then the “method” is just a wrapper that call this function.

;; Base "class"

(cl-defstruct greeter
  greeting)

(defun greet (thing)
  (funcall (greeter-greeting thing) thing))

;; Cow "class"

(cl-defstruct (cow (:include greeter)
                   (:constructor cow--create)))

(defun cow-create ()
  (cow--create :greeting (lambda (_) "Moo!")))

;; Bird "class"

(cl-defstruct (bird (:include greeter)
                    (:constructor bird--create)))

(defun bird-create ()
  (bird--create :greeting (lambda (_) "Chirp!")))

;; Usage:

(greet (cow-create))
;; => "Moo!"

(greet (bird-create))
;; => "Chirp!"

Since cl-generic is aware of the types created by cl-defstruct, functions can specialize on them as if they were native types. It’s a lot simpler to let cl-generic do all the hard work. The people reading your code will appreciate it, too:

(require 'cl-generic)

(cl-defgeneric greet (greeter))

(cl-defstruct cow)

(cl-defmethod greet ((_ cow))
  "Moo!")

(cl-defstruct bird)

(cl-defmethod greet ((_ bird))
  "Chirp!")

(greet (make-cow))
;; => "Moo!"

(greet (make-bird))
;; => "Chirp!"

The majority of the time a simple cl-defstruct will fulfill your needs, keeping in mind the gotcha with the constructor and copier names. Its use should feel almost as natural as defining functions.

Debugging Emacs or: How I Learned to Stop Worrying and Love DTrace

2018-01-17T23:59:49Z

Update: This article was featured on BSD Now 233 (starting at 21:38).

For some time Elfeed was experiencing a strange, spurious failure. Every so often users were seeing an error (spoiler warning) when updating feeds: “error in process sentinel: Search failed.” If you use Elfeed, you might have even seen this yourself. From the surface it appeared that curl, tasked with the responsibility for downloading feed data, was producing incomplete output despite reporting a successful run. Since the run was successful, Elfeed assumed certain data was in curl’s output buffer, but, since it wasn’t, it failed hard.

Unfortunately this issue was not reproducible. Manually running curl outside of Emacs never revealed any issues. Asking Elfeed to retry fetching the feeds would work fine. The issue would only randomly rear its head when Elfeed was fetching many feeds in parallel, under stress. By the time the error was discovered, the curl process had exited and vital debugging information was lost. Considering that this was likely to be a bug in Emacs itself, there really wasn’t a reliable way to capture the necessary debugging information from within Emacs Lisp. And, indeed, this later proved to be the case.

A quick-and-dirty work around is to use condition-case to catch and swallow the error. When the bizarre issue shows up, rather than fail badly in front of the user, Elfeed could attempt to swallow the error — assuming it can be reliably detected — and treat the fetch as simply a failure. That didn’t sit comfortably with me. Elfeed had done its due diligence checking for errors already. Someone was lying to Elfeed, and I intended to catch them with their pants on fire. Someday.

I’d just need to witness the bug on one of my own machines. Elfeed is part of my daily routine, so surely I’d have to experience this issue myself someday. My plan was, should that day come, to run a modified Elfeed, instrumented to capture extra data. I would have also routinely run Emacs under GDB so that I could inspect the failure more deeply.

For now I just had to wait to hunt that zebra.

Bryan Cantrill, DTrace, and FreeBSD

Over the holidays I re-discovered Bryan Cantrill, a systems software engineer who worked for Sun between 1996 and 2010, and is most well known for DTrace. My first exposure to him was in a BSD Now interview in 2015. I had re-watched that interview and decided there was a lot more I had to learn from him. He’s become a personal hero to me. So I scoured the internet for more of his writing and talks. Besides what I’ve already linked in this article, here are a couple more great presentations:

You can also find some of his writing scattered around the DTrace blog.

Some interesting operating system technology came out of Sun during its final 15 or so years — most notably DTrace and ZFS — and Bryan speaks about it passionately. Almost as a matter of luck, most of it survived the Oracle acquisition thanks to Sun releasing it as open source in just the nick of time. Otherwise it would have been lost forever. The scattered ex-Sun employees, still passionate about their prior work at Sun, along with some of their old customers have since picked up the pieces and kept going as a community under the name illumos. It’s like an open source flotilla.

Naturally I wanted to get my hands on this stuff to try it out for myself. Is it really as good as they say? Normally I stick to Linux, but it (generally) doesn’t have these Sun technologies. The main reason is license incompatibility. Sun released its code under the CDDL, which is incompatible with the GPL. Ubuntu does infamously include ZFS, but other distributions are unwilling to take that risk. Porting DTrace is a serious undertaking since it’s got its fingers throughout the kernel, which also makes the licensing issues even more complicated.

(Update Feburary 2018: DTrace has been released under the GPLv2, allowing it to be legally integrated with Linux.)

Linux has a reputation for Not Invented Here (NIH) syndrome, and these licensing issues certainly contribute to that. Rather than adopt ZFS and DTrace, they’ve been reinvented from scratch: btrfs instead of ZFS, and a slew of partial options instead of DTrace. Normally I’m most interested in system call tracing, and my go to is strace, though it certainly has its limitations — including this situation of debugging curl under Emacs. Another famous example of NIH is Linux’s epoll(2), which is a broken version of BSD kqueue(2).

So, if I want to try these for myself, I’ll need to install a different operating system. I’ve dabbled with OmniOS, an OS built on illumos, in virtual machines, using it as an alien environment to test some of my software (e.g. enchive). OmniOS has a philosophy called Keep Your Software To Yourself (KYSTY), which is really just code for “we don’t do packaging.” Honestly, you can’t blame them since they’re a tiny community. The best solution to this is probably pkgsrc, which is essentially a universal packaging system. Otherwise you’re on your own.

There’s also openindiana, which is a more friendly desktop-oriented illumos distribution. Still, the short of it is that you’re very much on your own when things don’t work. The situation is like running Linux a couple decades ago, when it was still difficult to do.

If you’re interested in trying DTrace, the easiest option these days is probably FreeBSD. It’s got a big, active community, thorough documentation, and a huge selection of packages. Its license (the BSD license, duh) is compatible with the CDDL, so both ZFS and DTrace have been ported to FreeBSD.

What is DTrace?

I’ve done all this talking but haven’t yet described what DTrace really is. I won’t pretend to write my own tutorial, but I’ll provide enough information to follow along. DTrace is a tracing framework for debugging production systems in real time, both for the kernel and for applications. The “production systems” part means it’s stable and safe — using DTrace won’t put your system at risk of crashing or damaging data. The “real time” part means it has little impact on performance. You can use DTrace on live, active systems with little impact. Both of these core design principles are vital for troubleshooting those really tricky bugs that only show up in production.

There are DTrace probes scattered all throughout the system: on system calls, scheduler events, networking events, process events, signals, virtual memory events, etc. Using a specialized language called D (unrelated to the general purpose programming language D), you can dynamically add behavior at these instrumentation points. Generally the behavior is to capture information, but it can also manipulate the event being traced.

Each probe is fully identified by a 4-tuple delimited by colons: provider, module, function, and probe name. An empty element denotes a sort of wildcard. For example, syscall::open:entry is a probe at the beginning (i.e. “entry”) of open(2). syscall:::entry matches all system call entry probes.

Unlike strace on Linux which monitors a specific process, DTrace applies to the entire system when active. To run curl under strace from Emacs, I’d have to modify Emacs’ behavior to do so. With DTrace I can instrument every curl process without making a single change to Emacs, and with negligible impact to Emacs. That’s a big deal.

So, when it comes to this Elfeed issue, FreeBSD is much better poised for debugging the problem. All I have to do is catch it in the act. However, it’s been months since that bug report and I’m not really making this connection yet. I’m just hoping I eventually find an interesting problem where I can apply DTrace.

FreeBSD on a Raspberry Pi 2

So I’ve settled in FreeBSD as the playground for these technologies, I just have to decide where. I could always run it in a virtual machine, but it’s always more interesting to try things out on real hardware. FreeBSD supports the Raspberry Pi 2 as a Tier 2 system, and I had a Raspberry Pi 2 sitting around collecting dust, so I put it to use.

I wrote the image to an SD card, and for a few days I stretched my legs on this new system. I cloned a couple dozen of my own git repositories, ran the builds and the tests, and just got a feel for things. I tried out the ports system for the first time, mainly to discover that the low-powered Raspberry Pi 2 takes days to build some of the packages I want to try.

I mostly program in Vim these days, so it’s some days before I even set up Emacs. Eventually I do build Emacs, clone my configuration, fire it up, and give Elfeed a spin.

And that’s when the “search failed” bug strikes! Not just once, but dozens of times. Perfect! This low-powered platform is the jackpot for this particular bug, triggering it left and right. Given that I’ve got DTrace at my disposal, it’s the perfect place to debug this. Something is lying to Elfeed and DTrace will play the judge.

Before I dive in I see three possibilities:

curl is reporting success but truncating its output.
Emacs is quietly truncating curl’s output.
Emacs is misinterpreting curl’s exit status.

With Dtrace I can observe what every curl process writes to Emacs, and I can also double check curl’s exit status. I come up with the following (newbie) DTrace script:

syscall::write:entry
/execname == "curl"/
{
    printf("%d WRITE %d \"%s\"\n",
           pid, arg2, stringof(copyin(arg1, arg2)));
}

syscall::exit:entry
/execname == "curl"/
{
    printf("%d EXIT  %d\n", pid, arg0);
}

The /execname == "curl"/ is a predicate that (obviously) causes the behavior to only fire for curl processes. The first probe has DTrace print a line for every write(2) from curl. arg0, arg1, and arg2 correspond to the arguments of write(2): fd, buf, count. It logs the process ID (pid) of the write, the length of the write, and the actual contents written. Remember that these curl processes are run in parallel by Emacs, so the pid allows me to associate the separate writes and the exit status.

The second probe prints the pid and the exit status (the first argument to exit(2)).

I also want to compare this to exactly what is delivered to Elfeed when curl exits, so I modify the process sentinel — the callback that handles a subprocess exiting — to call write-file before any action is taken. I can compare these buffer dumps to the logs produced by DTrace.

There are two important findings.

First, when the “search failed” bug occurs, the buffer was completely empty (95% of the time) or truncated at the end of the HTTP headers (5% of the time), right at the blank line. DTrace indicates that curl did its job to the full, so it’s Emacs who’s the liar. It’s not delivering all of curl’s data to Elfeed. That’s pretty annoying.

Second, curl was line-buffered. Each line was a separate, independent write(2). I was certainly not expecting this. Normally the C library only does line buffering when the output is a terminal. That’s because it’s guessing a user may be watching, expecting the output to arrive a line at a time.

Here’s a sample of what it looked like in the log:

88188 WRITE 32 "Server: Apache/2.4.18 (Ubuntu)
"
88188 WRITE 46 "Location: https://blog.plover.com/index.atom
"
88188 WRITE 21 "Content-Length: 299
"
88188 WRITE 45 "Content-Type: text/html; charset=iso-8859-1
"
88188 WRITE 2 "
"

Why would curl think Emacs is a terminal?

Oh. That’s right. This is the same problem I ran into four years ago when writing EmacSQL. By default Emacs connects to subprocesses through a pseudo-terminal (pty). I called this a mistake in Emacs back then, and I still stand by that claim. The pty causes weird, annoying problems for little benefit:

Interpreting control characters. Hope you weren’t transferring binary data!
Subprocesses will generally get line buffered. This makes them slower, though in some situations it might be desirable.
Stdout and stderr get mixed together. (Optional since Emacs 25.)
New! There’s a bug somewhere in Emacs that causes truncation when ptys are used heavily in parallel.

Just from eyeballing the DTrace log I knew what to do: dump the pty and switch to a pipe. This is controlled with the process-connection-type variable, and fixing it is a one-liner.

Not only did this completely resolve the truncation issue, Elfeed is noticeably faster at fetching feeds on all machines. It’s no longer receiving mountains of XML one line at a time, like sucking pudding through a straw. It’s now quite zippy even on my Raspberry Pi 2, which had never been the case before (without the “search failed” bug). Even if you were never affected by this bug, you will benefit from the fix.

I haven’t officially reported this as an Emacs bug yet because reproducibility is still an issue. It needs something better than “fire off a bunch of HTTP requests across the internet in parallel from a Raspberry Pi.”

The fix reminds me of that old boilermaker story about charging a lot of money just to swing a hammer. Once the problem arose, DTrace quickly helped to identify the place to hit Emacs with the hammer.

Finally, a big thanks to alphapapa for originally taking the time to report this bug months ago.

What's in an Emacs Lambda

2017-12-14T18:18:57Z

There was recently some interesting discussion about correctly using backquotes to express a mixture of data and code. Since lambda expressions seem to evaluate to themselves, what’s the difference? For example, an association list of operations:

'((add . (lambda (a b) (+ a b)))
  (sub . (lambda (a b) (- a b)))
  (mul . (lambda (a b) (* a b)))
  (div . (lambda (a b) (/ a b))))

It looks like it would work, and indeed it does work in this case. However, there are good reasons to actually evaluate those lambda expressions. Eventually invoking the lambda expressions in the quoted form above are equivalent to using eval. So, instead, prefer the backquote form:

`((add . ,(lambda (a b) (+ a b)))
  (sub . ,(lambda (a b) (- a b)))
  (mul . ,(lambda (a b) (* a b)))
  (div . ,(lambda (a b) (/ a b))))

There are a lot of interesting things to say about this, but let’s first reduce it to two very simple cases:

(lambda (x) x)

'(lambda (x) x)

What’s the difference between these two forms? The first is a lambda expression, and it evaluates to a function object. The other is a quoted list that looks like a lambda expression, and it evaluates to a list — a piece of data.

A naive evaluation of these expressions in *scratch* (C-x C-e) suggests they are are identical, and so it would seem that quoting a lambda expression doesn’t really matter:

(lambda (x) x)
;; => (lambda (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

However, there are two common situations where this is not the case: byte compilation and lexical scope.

Lambda under byte compilation

It’s a little trickier to evaluate these forms byte compiled in the scratch buffer since that doesn’t happen automatically. But if it did, it would look like this:

;;; -*- lexical-binding: nil; -*-

(lambda (x) x)
;; => #[(x) "\010\207" [x] 1]

'(lambda (x) x)
;; => (lambda (x) x)

The #[...] is the syntax for a byte-code function object. As discussed in detail in my byte-code internals article, it’s a special vector object that contains byte-code, and other metadata, for evaluation by Emacs’ virtual stack machine. Elisp is one of very few languages with readable function objects, and this feature is core to its ahead-of-time byte compilation.

The quote, by definition, prevents evaluation, and so inhibits byte compilation of the lambda expression. It’s vital that the byte compiler does not try to guess the programmer’s intent and compile the expression anyway, since that would interfere with lists that just so happen to look like lambda expressions — i.e. any list containing the lambda symbol.

There are three reasons you want your lambda expressions to get byte compiled:

Byte-compiled functions are significantly faster. That’s the main purpose for byte compilation after all.
The compiler performs static checks, producing warnings and errors ahead of time. This lets you spot certain classes of problems before they occur. The static analysis is even better under lexical scope due to its tighter semantics.
Under lexical scope, byte-compiled closures may use less memory. More specifically, they won’t accidentally keep objects alive longer than necessary. I’ve never seen a name for this implementation issue, but I call it overcapturing. More on this later.

While it’s common for personal configurations to skip byte compilation, Elisp should still generally be written as if it were going to be byte compiled. General rule of thumb: Ensure your lambda expressions are actually evaluated.

Lambda in lexical scope

As I’ve stressed many times, you should always use lexical scope. There’s no practical disadvantage or trade-off involved. Just do it.

Once lexical scope is enabled, the two expressions diverge even without byte compilation:

;;; -*- lexical-binding: t; -*-

(lambda (x) x)
;; => (closure (t) (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

Under lexical scope, lambda expressions evaluate to closures. Closures capture their lexical environment in their closure object — nothing in this particular case. It’s a type of function object, making it a valid first argument to funcall.

Since the quote prevents the second expression from being evaluated, semantically it evaluates to a list that just so happens to look like a (non-closure) function object. Invoking a data object as a function is like using eval — i.e. executing data as code. Everyone already knows eval should not be used lightly.

It’s a little more interesting to look at a closure that actually captures a variable, so here’s a definition for constantly, a higher-order function that returns a closure that accepts any number of arguments and returns a particular constant:

(defun constantly (x)
  (lambda (&rest _) x))

Without byte compiling it, here’s an example of its return value:

(constantly :foo)
;; => (closure ((x . :foo) t) (&rest _) x)

The environment has been captured as an association list (with a trailing t), and we can plainly see that the variable x is bound to the symbol :foo in this closure. Consider that we could manipulate this data structure (e.g. setcdr or setf) to change the binding of x for this closure. This is essentially how closures mutate their own environment. Moreover, closures from the same environment share structure, so such mutations are also shared. More on this later.

Semantically, closures are distinct objects (via eq), even if the variables they close over are bound to the same value. This is because they each have a distinct environment attached to them, even if in some invisible way.

(eq (constantly :foo) (constantly :foo))
;; => nil

Without byte compilation, this is true even when there’s no lexical environment to capture:

(defun dummy ()
  (lambda () t))

(eq (dummy) (dummy))
;; => nil

The byte compiler is smart, though. As an optimization, the same closure object is reused when possible, avoiding unnecessary work, including multiple object allocations. Though this is a bit of an abstraction leak. A function can (ab)use this to introspect whether it’s been byte compiled:

(defun have-i-been-compiled-p ()
  (let ((funcs (vector nil nil)))
    (dotimes (i 2)
      (setf (aref funcs i) (lambda ())))
    (eq (aref funcs 0) (aref funcs 1))))

(have-i-been-compiled-p)
;; => nil

(byte-compile 'have-i-been-compiled-p)

(have-i-been-compiled-p)
;; => t

The trick here is to evaluate the exact same non-capturing lambda expression twice, which requires a loop (or at least some sort of branch). Semantically we should think of these closures as being distinct objects, but, if we squint our eyes a bit, we can see the effects of the behind-the-scenes optimization.

Don’t actually do this in practice, of course. That’s what byte-code-function-p is for, which won’t rely on a subtle implementation detail.

Overcapturing

I mentioned before that one of the potential gotchas of not byte compiling your lambda expressions is overcapturing closure variables in the interpreter.

To evaluate lisp code, Emacs has both an interpreter and a virtual machine. The interpreter evaluates code in list form: cons cells, numbers, symbols, etc. The byte compiler is like the interpreter, but instead of directly executing those forms, it emits byte-code that, when evaluated by the virtual machine, produces identical visible results to the interpreter — in theory.

What this means is that Emacs contains two different implementations of Emacs Lisp, one in the interpreter and one in the byte compiler. The Emacs developers have been maintaining and expanding these implementations side-by-side for decades. A pitfall to this approach is that the implementations can, and do, diverge in their behavior. We saw this above with that introspective function, and it comes up in practice with advice.

Another way they diverge is in closure variable capture. For example:

;;; -*- lexical-binding: t; -*-

(defun overcapture (x y)
  (when y
    (lambda () x)))

(overcapture :x :some-big-value)
;; => (closure ((y . :some-big-value) (x . :x) t) nil x)

Notice that the closure captured y even though it’s unnecessary. This is because the interpreter doesn’t, and shouldn’t, take the time to analyze the body of the lambda to determine which variables should be captured. That would need to happen at run-time each time the lambda is evaluated, which would make the interpreter much slower. Overcapturing can get pretty messy if macros are introducing their own hidden variables.

On the other hand, the byte compiler can do this analysis just once at compile-time. And it’s already doing the analysis as part of its job. It can avoid this problem easily:

(overcapture :x :some-big-value)
;; => #[0 "\300\207" [:x] 1]

It’s clear that :some-big-value isn’t present in the closure.

But… how does this work?

How byte compiled closures are constructed

Recall from the internals article that the four core elements of a byte-code function object are:

Parameter specification
Byte-code string (opcodes)
Constants vector
Maximum stack usage

While a closure seems like compiling a whole new function each time the lambda expression is evaluated, there’s actually not that much to it! Namely, the behavior of the function remains the same. Only the closed-over environment changes.

What this means is that closures produced by a common lambda expression can all share the same byte-code string (second element). Their bodies are identical, so they compile to the same byte-code. Where they differ are in their constants vector (third element), which gets filled out according to the closed over environment. It’s clear just from examining the outputs:

(constantly :a)
;; => #[128 "\300\207" [:a] 2]

(constantly :b)
;; => #[128 "\300\207" [:b] 2]

constantly has three of the four components of the closure in its own constant pool. Its job is to construct the constants vector, and then assemble the whole thing into a byte-code function object (#[...]). Here it is with M-x disassemble:

     constant  make-byte-code
     constant  128
     constant  "\300\207"
     constant  vector
     stack-ref 4
     call      1
     constant  2
     call      4
     return

(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve simplified it for this discussion.)

It pushes most of its constants on the stack. Then the stack-ref 5 (5) puts x on the stack. Then it calls vector to create the constants vector (6). Finally, it constructs the function object (#[...]) by calling make-byte-code (8).

Since this might be clearer, here’s the same thing expressed back in terms of Elisp:

(defun constantly (x)
  (make-byte-code 128 "\300\207" (vector x) 2))

To see the disassembly of the closure’s byte-code:

(disassemble (constantly :x))

The result isn’t very surprising:

0       constant  :x
1       return

Things get a little more interesting when mutation is involved. Consider this adder closure generator, which mutates its environment every time it’s called:

(defun adder ()
  (let ((total 0))
    (lambda () (cl-incf total))))

(let ((count (adder)))
  (funcall count)
  (funcall count)
  (funcall count))
;; => 3

(adder)
;; => #[0 "\300\211\242T\240\207" [(0)] 2]

The adder essentially works like this:

(defun adder ()
  (make-byte-code 0 "\300\211\242T\240\207" (vector (list 0)) 2))

In theory, this closure could operate by mutating its constants vector directly. But that wouldn’t be much of a constants vector, now would it!? Instead, mutated variables are boxed inside a cons cell. Closures don’t share constant vectors, so the main reason for boxing is to share variables between closures from the same environment. That is, they have the same cons in each of their constant vectors.

There’s no equivalent Elisp for the closure in adder, so here’s the disassembly:

     constant  (0)
     dup
     car-safe
     add1
     setcar
     return

It puts two references to boxed integer on the stack (constant, dup), unboxes the top one (car-safe), increments that unboxed integer, stores it back in the box (setcar) via the bottom reference, leaving the incremented value behind to be returned.

This all gets a little more interesting when closures interact:

(defun fancy-adder ()
  (let ((total 0))
    `(:add ,(lambda () (cl-incf total))
      :set ,(lambda (v) (setf total v))
      :get ,(lambda () total))))

(let ((counter (fancy-adder)))
  (funcall (plist-get counter :set) 100)
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :get)))
;; => 102

(fancy-adder)
;; => (:add #[0 "\300\211\242T\240\207" [(0)] 2]
;;     :set #[257 "\300\001\240\207" [(0)] 3]
;;     :get #[0 "\300\242\207" [(0)] 1])

This is starting to resemble object oriented programming, with methods acting upon fields stored in a common, closed-over environment.

All three closures share a common variable, total. Since I didn’t use print-circle, this isn’t obvious from the last result, but each of those (0) conses are the same object. When one closure mutates the box, they all see the change. Here’s essentially how fancy-adder is transformed by the byte compiler:

(defun fancy-adder ()
  (let ((box (list 0)))
    (list :add (make-byte-code 0 "\300\211\242T\240\207" (vector box) 2)
          :set (make-byte-code 257 "\300\001\240\207" (vector box) 3)
          :get (make-byte-code 0 "\300\242\207" (vector box) 1))))

The backquote in the original fancy-adder brings this article full circle. This final example wouldn’t work correctly if those lambdas weren’t evaluated properly.

Make Flet Great Again

2017-10-27T21:02:58Z

Do you long for the days before Emacs 24.3 when flet was dynamically scoped? Well, you probably shouldn’t since there are some very good reasons lexical scope. But, still, a dynamically scoped flet is situationally really useful, particularly in unit testing. The good news is that it’s trivial to get this original behavior back without relying on deprecated functions nor third-party packages.

But first, what is flet and what does it mean for it to be dynamically scoped? The name stands for “function let” (or something to that effect). It’s a macro to bind named functions within a local scope, just as let binds variables within some local scope. It’s provided by the now-deprecated cl package.

(require 'cl)  ; deprecated!

(defun norm (x y)
  (flet ((square (v) (* v v)))
    (sqrt (+ (square x) (square y)))))

However, a gotcha here is that square is visible not just to the body of norm but also to any function called directly or indirectly from the flet body. That’s dynamic scope.

(flet ((sqrt (v) (/ v 2)))  ; close enough
  (norm 2 2))
;; -> 4

Note: This works because sqrt hasn’t (yet?) been assigned a bytecode opcode. One weakness with flet is that, due to being dynamically scoped, it is unable to define or override functions whose calls evaporate under byte compilation. For example, addition:

(defun add-with-flet ()
  (flet ((+ (&rest _) :override))
    (+ 1 2 3)))

(add-with-flet)
;; -> :override

(funcall (byte-compile #'add-with-flet))
;; -> 6

Since + has its own opcode, the function call is eliminated under byte-compilation and flet can’t do its job. This is similar these same functions being unadvisable.

cl-lib and cl-flet

The cl-lib package introduced in Emacs 24.3, replacing cl, adds a namespace prefix, cl-, to all of these Common Lisp style functions. In most cases this was the only change. One exception is cl-flet, which has different semantics: It’s lexically scoped, just like in Common Lisp. Its bindings aren’t visible outside of the cl-flet body.

(require 'cl-lib)

(cl-flet ((sqrt (v) (/ v 2)))
  (norm 2 2))
;; -> 2.8284271247461903

In most cases this is what you actually want. The old flet subtly changes the environment for all functions called directly or indirectly from its body.

Besides being cleaner and less error prone, cl-flet also doesn’t have special exceptions for functions with assigned opcodes. At macro-expansion time it walks the body, taking its action before the byte-compiler can interfere.

(defun add-with-cl-flet ()
  (cl-flet ((+ (&rest _) :override))
    (+ 1 2 3)))

(add-with-cl-flet)
;; -> :override

(funcall (byte-compile #'add-with-cl-flet))
;; -> :override

In order for it to work properly, it’s essential that functions are quoted with sharp-quotes (#') so that the macro can tell the difference between functions and symbols. Just make a general habit of sharp-quoting functions.

In unit testing, temporarily overriding functions for all of Emacs is useful, so flet still has some uses. But it’s deprecated!

Unit testing with flet

Since Emacs can do anything, suppose there is an Emacs package that makes sandwiches. In this package there’s an interactive function to set the default sandwich cheese.

(defvar default-cheese 'cheddar)

(defun set-default-cheese (type)
  (interactive
   (let* ((options '("cheddar" "swiss" "american"))
          (input (completing-read "Cheese: " options nil t)))
     (when input
       (list (intern input)))))
  (setf default-cheese type))

Since it’s interactive, it uses completing-read to prompt the user for input. A unit test could call this function non-interactively, but perhaps we’d also like to test the interactive path. The code inside interactive occasionally gets messy and may warrant testing. It would obviously be inconvenient to prompt the user for input during testing, and it wouldn’t work at all in batch mode (-batch).

With flet we can stub out completing-read just for the unit test:

;;; -*- lexical-binding: t; -*-

(ert-deftest test-set-default-cheese ()
  ;; protect original with dynamic binding
  (let (default-cheese)
    ;; simulate user entering "american"
    (flet ((completing-read (&rest _) "american"))
      (call-interactively #'set-default-cheese)
      (should (eq 'american default-cheese)))))

Since default-cheese was defined with defvar, it will be dynamically scoped despite let normally using lexical scope in this example. Both of the side effects of the tested function — setting a global variable and prompting the user — are captured using a combination of let and flet.

Since cl-flet is lexically scoped, it cannot serve this purpose. If flet is deprecated and cl-flet can’t do the job, what’s the right way to fix it? The answer lies in generalized variables.

cl-letf

What’s really happening inside flet is it’s globally binding a function name to a different function, evaluating the body, and rebinding it back to the original definition when the body completes. It macro-expands to something like this:

(let ((original (symbol-function 'completing-read)))
  (setf (symbol-function 'completing-read)
        (lambda (&rest _) "american"))
  (unwind-protect
      (call-interactively #'set-default-cheese)
    (setf (symbol-function 'completing-read) original)))

The unwind-protect ensures the original function is rebound even if the body of the call were to fail. This is very much a let-like pattern, and I’m using symbol-function as a generalized variable via setf. Is there a generalized variable version of let?

Yes! It’s called cl-letf! In this case the f suffix is analogous to the f suffix in setf. That form above can be reduced to a more general form:

(cl-letf (((symbol-function 'completing-read)
           (lambda (&rest _) "american")))
  (call-interactively #'set-default-cheese))

And that’s the way to reproduce the dynamically scoped behavior of flet since Emacs 24.3. There’s nothing complicated about it.

(ert-deftest test-set-default-cheese ()
  (let (default-cheese)
    (cl-letf (((symbol-function 'completing-read)
               (lambda (&rest _) "american")))
      (call-interactively #'set-default-cheese)
      (should (eq 'american default-cheese)))))

Keep in mind that this suffers the exact same problem with bytecode-assigned functions as flet, and for exactly the same reasons. If completing-read were to ever be assigned its own opcode then cl-letf would no longer work for this particular example.

Gap Buffers Are Not Optimized for Multiple Cursors

2017-09-07T01:34:04Z

Gap buffers are a common data structure for representing a text buffer in a text editor. Emacs famously uses gap buffers — long-standing proof that gap buffers are a perfectly sufficient way to represent a text buffer.

Gap buffers are very easy to implement. A bare minimum implementation is about 60 lines of C.
Gap buffers are especially efficient for the majority of typical editing commands, which tend to be clustered in a small area.
Except for the gap, the content of the buffer is contiguous, making the search and display implementations simpler and more efficient. There’s also the potential for most of the gap buffer to be memory-mapped to the original file, though typical encoding and decoding operations prevent this from being realized.
Due to having contiguous content, saving a gap buffer is basically just two write(2) system calls. (Plus fsync(2), etc.)

A gap buffer is really a pair of buffers where one buffer holds all of the content before the cursor (or point for Emacs), and the other buffer holds the content after the cursor. When the cursor is moved through the buffer, characters are copied from one buffer to the other. Inserts and deletes close to the gap are very efficient.

Typically it’s implemented as a single large buffer, with the pre-cursor content at the beginning, the post-cursor content at the end, and the gap spanning the middle. Here’s an illustration:

The top of the animation is the display of the text content and cursor as the user would see it. The bottom is the gap buffer state, where each character is represented as a gray block, and a literal gap for the cursor.

Ignoring for a moment more complicated concerns such as undo and Unicode, a gap buffer could be represented by something as simple as the following:

struct gapbuf {
    char *buf;
    size_t total;  /* total size of buf */
    size_t front;  /* size of content before cursor */
    size_t gap;    /* size of the gap */
};

This is close to how Emacs represents it. In the structure above, the size of the content after the cursor isn’t tracked directly, but can be computed on the fly from the other three quantities. That is to say, this data structure is normalized.

As an optimization, the cursor could be tracked separately from the gap such that non-destructive cursor movement is essentially free. The difference between cursor and gap would only need to be reconciled for a destructive change — an insert or delete.

A gap buffer certainly isn’t the only way to do it. For example, the original vi used an array of lines, which sort of explains some of its quirky line-oriented idioms. The BSD clone of vi, nvi, uses an entire database to represent buffers. Vim uses a fairly complex rope-like data structure with page-oriented blocks, which may be stored out-of-order in its swap file.

Multiple cursors

Multiple cursors is fairly recent text editor invention that has gained a lot of popularity recent years. It seems every major editor either has the feature built in or a readily-available extension. I myself used Magnar Sveen’s well-polished package for several years. Though obviously the concept didn’t originate in Emacs or else it would have been called multiple points, which doesn’t quite roll off the tongue quite the same way.

The concept is simple: If the same operation needs to done in many different places in a buffer, you place a cursor at each position, then drive them all in parallel using the same commands. It’s super flashy and great for impressing all your friends.

However, as a result of improving my typing skills, I’ve come to the conclusion that multiple cursors is all hat and no cattle. It doesn’t compose well with other editing commands, it doesn’t scale up to large operations, and it’s got all sorts of flaky edge cases (off-screen cursors). Nearly anything you can do with multiple cursors, you can do better with old, well-established editing paradigms.

Somewhere around 99% of my multiple cursors usage was adding a common prefix to a contiguous serious of lines. As similar brute force options, Emacs already has rectangular editing, and Vim already has visual block mode.

The most sophisticated, flexible, and robust alternative is a good old macro. You can play it back anywhere it’s needed. You can zip it across a huge buffer. The only downside is that it’s less flashy and so you’ll get invited to a slightly smaller number of parties.

But if you don’t buy my arguments about multiple cursors being tasteless, there’s still a good technical argument: Gap buffers are not designed to work well in the face of multiple cursors!

For example, suppose we have a series of function calls and we’d like to add the same set of arguments to each. It’s a classic situation for a macro or for multiple cursors. Here’s the original code:

foo();
bar();
baz();

The example is tiny so that it will fit in the animations to come. Here’s the desired code:

foo(x, y);
bar(x, y);
baz(x, y);

With multiple cursors you would place a cursor inside each set of parenthesis, then type x, y. Visually it looks something like this:

Text is magically inserted in parallel in multiple places at a time. However, if this is a text editor that uses a gap buffer, the situation underneath isn’t quite so magical. The entire edit doesn’t happen at once. First the x is inserted in each location, then the comma, and so on. The edits are not clustered so nicely.

From the gap buffer’s point of view, here’s what it looks like:

For every individual character insertion the buffer has to visit each cursor in turn, performing lots of copying back and forth. The more cursors there are, the worse it gets. For an edit of length n with m cursors, that’s O(n * m) calls to memmove(3). Multiple cursors scales badly.

Compare that to the old school hacker who can’t be bothered with something as tacky and modern (eww!) as multiple cursors, instead choosing to record a macro, then play it back:

The entire edit is done locally before moving on to the next location. It’s perfectly in tune with the gap buffer’s expectations, only needing O(m) calls to memmove(3). Most of the work flows neatly into the gap.

So, don’t waste your time with multiple cursors, especially if you’re using a gap buffer text editor. Instead get more comfortable with your editor’s macro feature. If your editor doesn’t have a good macro feature, get a new editor.

If you want to make your own gap buffer animations, here’s the source code. It includes a tiny gap buffer implementation:

https://github.com/skeeto/gap-buffer-animator

Vim vs. Emacs: the Working Directory

2017-08-22T04:51:36Z

Vim and Emacs have different internals models for the current working directory, and these models influence the overall workflow for each editor. They decide how files are opened, how shell commands are executed, and how the build system is operated. These effects even reach outside the editor to influence the overall structure of the project being edited.

In the traditional unix model, which was eventually adopted everywhere else, each process has a particular working directory tracked by the operating system. When a process makes a request to the operating system using a relative path — a path that doesn’t begin with a slash — the operating system uses the process’ working directory to convert the path into an absolute path. When a process forks, its child starts in the same directory. A process can change its working directory at any time using chdir(2), though most programs never need to do it. The most obvious way this system call is exposed to regular users is through the shell’s built-in cd command.

Vim’s spiritual heritage is obviously rooted in vi, one of the classic unix text editors, and the most elaborate text editor standardized by POSIX. Like vi, Vim closely follows the unix model for working directories. At any given time Vim has exactly one working directory. Shell commands that are run within Vim will start in Vim’s working directory. Like a shell, the cd ex command changes and queries Vim’s working directory.

Emacs eschews this model and instead each buffer has its own working directory tracked using a buffer-local variable, default-directory. Emacs internally simulates working directories for its buffers like an operating system, resolving absolute paths itself, giving credence to the idea that Emacs is an operating system (“lacking only a decent editor”). Perhaps this model comes from ye olde lisp machines?

In contrast, Emacs’ M-x cd command manipulates the local variable and has no effect on the Emacs process’ working directory. In fact, Emacs completely hides its operating system working directory from Emacs Lisp. This can cause some trouble if that hidden working directory happens to be sitting on filesystem you’d like to unmount.

Vim can be configured to simulate Emacs’ model with its autochdir option. When set, Vim will literally chdir(2) each time the user changes buffers, switches windows, etc. To the user, this feels just like Emacs’ model, but this is just a convenience, and the core working directory model is still the same.

Single instance editors

For most of my Emacs career, I’ve stuck to running a single, long-lived Emacs instance no matter how many different tasks I’m touching simultaneously. I start the Emacs daemon shortly after logging in, and it continues running until I log out — typically only when the machine is shut down. It’s common to have multiple Emacs windows (frames) for different tasks, but they’re all bound to the same daemon process.

While with care it’s possible to have a complex, rich Emacs configuration that doesn’t significantly impact Emacs’ startup time, the general consensus is that Emacs is slow to start. But since it has a really solid daemon, this doesn’t matter: hardcore Emacs users only ever start Emacs occasionally. The rest of the time they’re launching emacsclient and connecting to the daemon. Outside of system administration, it’s the most natural way to use Emacs.

The case isn’t so clear for Vim. Vim is so fast that many users fire it up on demand and exit when they’ve finished the immediate task. At the other end of the spectrum, others advocate using a single instance of Vim like running a single Emacs daemon. In my initial dive into Vim, I tried the single-instance, Emacs way of doing things. I set autochdir out of necessity and pretended each buffer had its own working directory.

At least for me, this isn’t the right way to use Vim, and it all comes down to working directories. I want Vim to be anchored at the project root with one Vim instance per project. Everything is smoother when it happens in the context of the project’s root directory, from opening files, to running shell commands (ctags in particular), to invoking the build system. With autochdir, these actions are difficult to do correctly, particularly the last two.

Invoking the build

I suspect the Emacs’ model of per-buffer working directories has, in a Sapir-Whorf sort of way, been responsible for leading developers towards poorly-designed, recursive Makefiles. Without a global concept of working directory, it’s inconvenient to invoke the build system (M-x compile) in some particular grandparent directory that is the root of the project. If each directory has its own Makefile, it usually makes sense to invoke make in the same directory as the file being edited.

Over the years I’ve been reinventing the same solution to this problem, and it wasn’t until I spent time with Vim and its alternate working directory model that I truly understood the problem. Emacs itself has long had a solution lurking deep in its bowels, unseen by daylight: dominating files. The function I’m talking about is locate-dominating-file:

(locate-dominating-file FILE NAME)

Look up the directory hierarchy from FILE for a directory containing NAME. Stop at the first parent directory containing a file NAME, and return the directory. Return nil if not found. Instead of a string, NAME can also be a predicate taking one argument (a directory) and returning a non-nil value if that directory is the one for which we’re looking.

The trouble of invoking the build system at the project root is that Emacs doesn’t really have a concept of a project root. It doesn’t know where it is or how to find it. The vi model inherited by Vim is to leave the working directory at the project root. While Vim can simulate Emacs’ working directory model, Emacs cannot (currently) simulate Vim’s model.

Instead, by identifying a file name unique to the project’s root (i.e. a “dominating” file) such as Makefile or build.xml, then locate-dominating-file can discover the project root. All that’s left is wrapping M-x compile so that default-directory is temporarily adjusted to the project’s root.

That looks very roughly like this (and needs more work):

(defun my-compile ()
  (interactive)
  (let ((default-directory (locate-dominating-file "." "Makefile")))
    (compile "make")))

It’s a pattern I’ve used again and again and again, working against the same old friction. By running one Vim instance per project at the project’s root, I get the correct behavior for free.

My Journey with Touch Typing and Vim

2017-04-01T04:02:08Z

Given the title, the publication date of this article is probably really confusing. This was deliberate.

Three weeks ago I made a conscious decision to improve my typing habits. You see, I had a dirty habit. Despite spending literally decades typing on a daily basis, I’ve been a weak typist. It wasn’t exactly finger pecking, nor did it require looking down at the keyboard as I typed, but rather a six-finger dance I developed organically over the years. My technique was optimized towards Emacs’ frequent use of CTRL and ALT combinations, avoiding most of the hand scrunching. It was fast enough to keep up with my thinking most of the time, but was ultimately limiting due to its poor accuracy. I was hitting the wrong keys far too often.

My prime motivation was to learn Vim — or, more specifically, to learn modal editing. Lots of people swear by it, including people whose opinions I hold in high regard. The modal editing community is without a doubt larger than the Emacs community, especially since, thanks to Viper and Evil, a subset of the Emacs community is also part of the modal editing community. There’s obviously something significantly valuable about it, and I wanted to understand what that was.

But I was a lousy typist who couldn’t hit the right keys often enough to make effective use of modal editing. I would need to learn touch typing first.

Touch typing

How would I learn? Well, the first search result for “online touch typing course” was Typing Club, so that’s what I went with. By the way, here’s my official review: “Good enough not to bother checking out the competition.” For a website it’s pretty much the ultimate compliment, but it’s not exactly the sort of thing you’d want to hear from your long-term partner.

My hard rule was that I would immediately abandon my old habits cold turkey. Poor typing is a bad habit just like smoking, minus the cancer and weakened sense of smell. It was vital that I unlearn all that old muscle memory. That included not just my six-finger dance, but also my NetHack muscle memory. NetHack uses “hjkl” for navigation just like Vim. The problem was that I’d spent a couple hundred hours in NetHack over the past decade with my index finger on “h”, not the proper home row location. It was disorienting to navigate around Vim initally, like riding a bicycle with inverted controls.

Based on reading other people’s accounts, I determined I’d need several days of introductory practice where I’d be utterly unproductive. I took a three-day weekend, starting my touch typing lessons on a Thursday evening. Boy, they weren’t kidding about it being slow going. It was a rough weekend. When checking in on my practice, my wife literally said she pitied me. Ouch.

By Monday I was at a level resembling a very slow touch typist. For the rest of the first week I followed all the lessons up through the number keys, never progressing past an exercise until I had exceeded the target speed with at least 90% accuracy. This was now enough to get me back on my feet for programming at a glacial, frustrating pace. Programming involves a lot more numbers and symbols than other kinds of typing, making that top row so important. For a programmer, it would probably be better for these lessons to be earlier in the series.

For that first week I mostly used Emacs while I was finding my feet (or finding my fingers?). That’s when I experienced first hand what all these non-Emacs people — people who I, until recently, considered to be unenlightened simpletons — had been complaining about all these years: Pressing CTRL and ALT key combinations from the home row is a real pain in in the ass! These complaints were suddenly making sense. I was already seeing the value of modal editing before I even started really learning Vim. It made me look forward to it even more.

During the second week of touch typing I went though Derek Wyatt’s Vim videos and learned my way around the :help system enough to bootstrap my Vim education. I then read through the user manual, practicing along the way. I’ll definitely have to pass through it a few more times to pick up all sorts of things that didn’t stick. This is one way that Emacs and Vim are a lot alike.

Update: Practical Vim: Edit Text at the Speed of Thought was recommended in the comments, and it’s certainly a better place to start than the Vim user manual. Unlike the manual, it’s opinionated and focuses on good habits, which is exactly what a newbie needs.

One of my rules when learning Vim was to resist the urge to remap keys. I’ve done it a lot with Emacs: “Hmm, that’s not very convenient. I’ll change it.” It means my Emacs configuration is fairly non-standard, and using Emacs without my configuration is like using an unfamiliar editor. This is both good and bad. The good is that I’ve truly changed Emacs to be my editor, suited just for me. The bad is that I’m extremely dependent on my configuration. What if there was a text editing emergency?

With Vim as a sort of secondary editor, I want to be able to fire it up unconfigured and continue to be nearly as productive. A pile of remappings would prohibit this. In my mind this is like a form of emergency preparedness. Other people stock up food and supplies. I’m preparing myself to sit at a strange machine without any of my configuration so that I can start the rewrite of the software lost in the disaster, so long as that machine has vi, cc, and make. If I can’t code in C, then what’s the point in surviving anyway?

The other reason is that I’m just learning. A different mapping might seem more appropriate, but what do I know at this point? It’s better to follow the beaten path at first, lest I form a bunch of bad habits again. Trust in the knowledge of the ancients.

Future directions

I am absolutely sticking with modal editing for the long term. I’m really enjoying it so far. At three weeks of touch typing and two weeks of modal editing, I’m around 80% caught back up with my old productivity speed, but this time I’ve got a lot more potential for improvement.

For now, Vim will continue taking over more and more of my text editing work. My last three articles were written in Vim. It’s really important to keep building proficiency. I still rely on Emacs for email and for syndication feeds, and that’s not changing any time soon. I also really like Magit as a Git interface. Plus I don’t want to abandon years of accumulated knowledge and leave the users of my various Emacs packages out to dry. Ultimately I believe will end up using Evil, to get what seems to be the best of both worlds: modal editing and Emacs’ rich extensibility.

Asynchronous Requests from Emacs Dynamic Modules

2017-02-14T02:30:00Z

A few months ago I had a discussion with Vladimir Kazanov about his Orgfuse project: a Python script that exposes an Emacs Org-mode document as a FUSE filesystem. It permits other programs to navigate the structure of an Org-mode document through the standard filesystem APIs. I suggested that, with the new dynamic modules in Emacs 25, Emacs itself could serve a FUSE filesystem. In fact, support for FUSE services in general could be an package of his own.

So that’s what he did: Elfuse. It’s an old joke that Emacs is an operating system, and here it is handling system calls.

However, there’s a tricky problem to solve, an issue also present my joystick module. Both modules handle asynchronous events — filesystem requests or joystick events — but Emacs runs the event loop and owns the main thread. The external events somehow need to feed into the main event loop. It’s even more difficult with FUSE because FUSE also wants control of its own thread for its own event loop. This requires Elfuse to spawn a dedicated FUSE thread and negotiate a request/response hand-off.

When a filesystem request or joystick event arrives, how does Emacs know to handle it? The simple and obvious solution is to poll the module from a timer.

struct queue requests;

emacs_value
Frequest_next(emacs_env *env, ptrdiff_t n, emacs_value *args, void *p)
{
    emacs_value next = Qnil;
    queue_lock(requests);
    if (queue_length(requests) > 0) {
        void *request = queue_pop(requests, env);
        next = env->make_user_ptr(env, fin_empty, request);
    }
    queue_unlock(request);
    return next;
}

And then ask Emacs to check the module every, say, 10ms:

(defun request--poll ()
  (let ((next (request-next)))
    (when next
      (request-handle next))))

(run-at-time 0 0.01 #'request--poll)

Blocking directly on the module’s event pump with Emacs’ thread would prevent Emacs from doing important things like, you know, being a text editor. The timer allows it to handle its own events uninterrupted. It gets the job done, but it’s far from perfect:

It imposes an arbitrary latency to handling requests. Up to the poll period could pass before a request is handled.
Polling the module 100 times per second is inefficient. Unless you really enjoy recharging your laptop, that’s no good.

The poll period is a sliding trade-off between latency and battery life. If only there was some mechanism to, ahem, signal the Emacs thread, informing it that a request is waiting…

SIGUSR1

Emacs Lisp programs can handle the POSIX SIGUSR1 and SIGUSR2 signals, which is exactly the mechanism we need. The interface is a “key” binding on special-event-map, the keymap that handles these kinds of events. When the signal arrives, Emacs queues it up for the main event loop.

(define-key special-event-map [sigusr1]
  (lambda ()
    (interactive)
    (request-handle (request-next))))

The module blocks on its own thread on its own event pump. When a request arrives, it queues the request, rings the bell for Emacs to come handle it (raise()), and waits on a semaphore. For illustration purposes, assume the module reads requests from and writes responses to a file descriptor, like a socket.

int event_fd = /* ... */;
struct request request;
sem_init(&request.sem, 0, 0);

for (;;) {
    /* Blocking read for request event */
    read(event_fd, &request.event, sizeof(request.event));

    /* Put request on the queue */
    queue_lock(requests);
    queue_push(requests, &request);
    queue_unlock(requests);
    raise(SIGUSR1);  // TODO: Should raise() go inside the lock?

    /* Wait for Emacs */
    while (sem_wait(&request.sem))
        ;

    /* Reply with Emacs' response */
    write(event_fd, &request.response, sizeof(request.response));
}

The sem_wait() is in a loop because signals will wake it up prematurely. In fact, it may even wake up due to its own signal on the line before. This is the only way this particular use of sem_wait() might fail, so there’s no need to check errno.

If there are multiple module threads making requests to the same global queue, the lock is necessary to protect the queue. The semaphore is only for blocking the thread until Emacs has finished writing its particular response. Each thread has its own semaphore.

When Emacs is done writing the response, it releases the module thread by incrementing the semaphore. It might look something like this:

emacs_value
Frequest_complete(emacs_env *env, ptrdiff_t n, emacs_value *args, void *p)
{
    struct request *request = env->get_user_ptr(env, args[0]);
    if (request)
        sem_post(&request->sem);
    return Qnil;
}

The top-level handler dispatches to the specific request handler, calling request-complete above when it’s done.

(defun request-handle (next)
  (condition-case e
      (cl-ecase (request-type next)
        (:open  (request-handle-open  next))
        (:close (request-handle-close next))
        (:read  (request-handle-read  next)))
    (error (request-respond-as-error next e)))
  (request-complete))

This SIGUSR1+semaphore mechanism is roughly how Elfuse currently processes requests.

Making it work on Windows

Windows doesn’t have signals. This isn’t a problem for Elfuse since Windows doesn’t have FUSE either. Nor does it matter for Joymacs since XInput isn’t event-driven and always requires polling. But someday someone will need this mechanism for a dynamic module on Windows.

Fortunately there’s a solution: input language change events, WM_INPUTLANGCHANGE. It’s also on special-event-map:

(define-key special-event-map [language-change]
  (lambda ()
    (interactive)
    (request-process (request-next))))

Instead of raise() (or pthread_kill()), broadcast the window event with PostMessage(). Outside of invoking the language-change key binding, Emacs will ignore the event because WPARAM is 0 — it doesn’t belong to any particular window. We don’t really want to change the input language, after all.

PostMessageA(HWND_BROADCAST, WM_INPUTLANGCHANGE, 0, 0);

Naturally you’ll also need to replace the POSIX threading primitives with the Windows versions (CreateThread(), CreateSemaphore(), etc.). With a bit of abstraction in the right places, it should be pretty easy to support both POSIX and Windows in these asynchronous dynamic module events.

How to Write Fast(er) Emacs Lisp

2017-01-30T21:08:19Z

Not everything written in Emacs Lisp needs to be fast. Most of Emacs itself — around 82% — is written in Emacs Lisp because those parts are generally not performance-critical. Otherwise these functions would be built-ins written in C. Extensions to Emacs don’t have a choice and — outside of a few exceptions like dynamic modules and inferior processes — must be written in Emacs Lisp, including their performance-critical bits. Common performance hot spots are automatic indentation, AST parsing, and interactive completion.

Here are 5 guidelines, each very specific to Emacs Lisp, that will result in faster code. The non-intrusive guidelines could be applied at all times as a matter of style — choosing one equally expressive and maintainable form over another just because it performs better.

There’s one caveat: These guidelines are focused on Emacs 25.1 and “nearby” versions. Emacs is constantly evolving. Changes to the virtual machine and byte-code compiler may transform currently-slow expressions into fast code, obsoleting some of these guidelines. In the future I’ll add notes to this article for anything that changes.

(1) Use lexical scope

This guideline refers to the following being the first line of every Emacs Lisp source file you write:

;;; -*- lexical-binding: t; -*-

This point is worth mentioning again and again. Not only will your code be more correct, it will be measurably faster. Dynamic scope is still opt-in through the explicit use of special variables, so there’s absolutely no reason not to be using lexical scope. If you’ve written clean, dynamic scope code, then switching to lexical scope won’t have any effect on its behavior.

Along similar lines, special variables are a lot slower than local, lexical variables. Only use them when necessary.

(2) Prefer built-in functions

Built-in functions are written in C and are, as expected, significantly faster than the equivalent written in Emacs Lisp. Complete as much work as possible inside built-in functions, even if it might mean taking more conceptual steps overall.

For example, what’s the fastest way to accumulate a list of items? That is, new items go on the tail but, for algorithm reasons, the list must be constructed from the head.

You might be tempted to keep track of the tail of the list, appending new elements directly to the tail with setcdr (via setf below).

(defun fib-track-tail (n)
  (let* ((a 0)
         (b 1)
         (head (list 1))
         (tail head))
    (dotimes (_ n head)
      (psetf a b
             b (+ a b))
      (setf (cdr tail) (list b)
            tail (cdr tail)))))

(fib-track-tail 8)
;; => (1 1 2 3 5 8 13 21 34)

Actually, it’s much faster to construct the list in reverse, then destructively reverse it at the end.

(defun fib-nreverse (n)
  (let* ((a 0)
         (b 1)
         (list (list 1)))
    (dotimes (_ n (nreverse list))
      (psetf a b
             b (+ a b))
      (push b list))))

It might not look it, but nreverse is very fast. Not only is it a built-in, it’s got its own opcode. Using push in a loop, then finishing with nreverse is the canonical and fastest way to accumulate a list of items.

In fib-track-tail, the added complexity of tracking the tail in Emacs Lisp is much slower than zipping over the entire list a second time in C.

(3) Avoid unnecessary lambda functions

I’m talking about mapcar and friends.

;; Slower
(defun expt-list (list e)
  (mapcar (lambda (x) (expt x e)) list))

Listen, I know you love dash.el and higher order functions, but this habit ain’t cheap. The byte-code compiler does not know how to inline these lambdas, so there’s an additional per-element function call overhead.

Worse, if you’re using lexical scope like I told you, the above example forms a closure over e. This means a new function object is created (e.g. make-byte-code) each time expt-list is called. To be clear, I don’t mean that the lambda is recompiled each time — the same byte-code string is shared between all instances of the same lambda. A unique function vector (#[...]) and constants vector are allocated and initialized each time expt-list is invoked.

Related mini-guideline: Don’t create any more garbage than strictly necessary in performance-critical code.

Compare to an implementation with an explicit loop, using the nreverse list-accumulation technique.

(defun expt-list-fast (list e)
  (let ((result ()))
    (dolist (x list (nreverse result))
      (push (expt x e) result))))

No unnecessary garbage is created.
No unnecessary per-element function calls.

This is the fastest possible definition for this function, and it’s what you need to use in performance-critical code.

Personally I prefer the list comprehension approach, using cl-loop from cl-lib.

(defun expt-list-fast (list e)
  (cl-loop for x in list
           collect (expt x e)))

The cl-loop macro will expand into essentially the previous definition, making them practically equivalent. It takes some getting used to, but writing efficient loops is a whole lot less tedious with cl-loop.

In Emacs 24.4 and earlier, catch/throw is implemented by converting the body of the catch into a lambda function and calling it. If code inside the catch accesses a variable outside the catch (very likely), then, in lexical scope, it turns into a closure, resulting in the garbage function object like before.

In Emacs 24.5 and later, the byte-code compiler uses a new opcode, pushcatch. It’s a whole lot more efficient, and there’s no longer a reason to shy away from catch/throw in performance-critical code. This is important because it’s often the only way to perform an early bailout.

(4) Prefer using functions with dedicated opcodes

When following the guideline about using built-in functions, you might have several to pick from. Some built-in functions have dedicated virtual machine opcodes, making them much faster to invoke. Prefer these functions when possible.

How can you tell when a function has an assigned opcode? Take a peek at the byte-defop listings in bytecomp.el. Optimization often involves getting into the weeds, so don’t be shy.

For example, the assq and assoc functions search for a matching key in an association list (alist). Both are built-in functions, and the only difference is that the former compares keys with eq (e.g. symbol or integer keys) and the latter with equal (typically string keys). The difference in performance between eq and equal isn’t as important as another factor: assq has its own opcode (158).

This means in performance-critical code you should prefer assq, perhaps even going as far as restructuring your alists specifically to have eq keys. That last step is probably a trade-off, which means you’ll want to make some benchmarks to help with that decision.

Another example is eq, =, eql, and equal. Some macros and functions use eql, especially cl-lib which inherits eql as a default from Common Lisp. Take cl-case, which is like switch from the C family of languages. It compares elements with eql.

(defun op-apply (op a b)
  (cl-case op
    (:norm (+ (* a a) (* b b)))
    (:disp (abs (- a b)))
    (:isin (/ b (sin a)))))

The cl-case expands into a cond. Since Emacs byte-code lacks support for jump tables, there’s not much room for cleverness.

Update: Emacs 26.1, released May 2018, introduced a jump table opcode.

(defun op-apply (op a b)
  (cond
   ((eql op :norm) (+ (* a a) (* b b)))
   ((eql op :disp) (abs (- a b)))
   ((eql op :isin) (/ b (sin a)))))

It turns out eql is pretty much always the worst choice for cl-case. Of the four equality functions I listed, the only one lacking an opcode is eql. A faster definition would use eq. (In theory, cl-case could have done this itself because it knows all the keys are symbols.)

(defun op-apply (op a b)
  (cond
   ((eq op :norm) (+ (* a a) (* b b)))
   ((eq op :disp) (abs (- a b)))
   ((eq op :isin) (/ b (sin a)))))

Fortunately eq can safely compare integers in Emacs Lisp. You only need eql when comparing symbols, integers, and floats all at once, which is unusual.

(5) Unroll loops using and/or

Consider the following function which checks its argument against a list of numbers, bailing out on the first match. I used % instead of mod since the former has an opcode (166) and the latter does not.

(defun detect (x)
  (catch 'found
    (dolist (f '(2 3 5 7 11 13 17 19 23 29 31))
      (when (= 0 (% x f))
        (throw 'found f)))))

The byte-code compiler doesn’t know how to unroll loops. Fortunately that’s something we can do for ourselves using and and or. The compiler will turn this into clean, efficient jumps in the byte-code.

(defun detect-unrolled (x)
  (or (and (= 0 (% x 2)) 2)
      (and (= 0 (% x 3)) 3)
      (and (= 0 (% x 5)) 5)
      (and (= 0 (% x 7)) 7)
      (and (= 0 (% x 11)) 11)
      (and (= 0 (% x 13)) 13)
      (and (= 0 (% x 17)) 17)
      (and (= 0 (% x 19)) 19)
      (and (= 0 (% x 23)) 23)
      (and (= 0 (% x 29)) 29)
      (and (= 0 (% x 31)) 31)))

In Emacs 24.4 and earlier with the old-fashioned lambda-based catch, the unrolled definition is seven times faster. With the faster pushcatch-based catch it’s about twice as fast. This means the loop overhead accounts for about half the work of the first definition of this function.

Update: It was pointed out in the comments that this particular example is equivalent to a cond. That’s literally true all the way down to the byte-code, and it would be a clearer way to express the unrolled code. In real code it’s often not quite equivalent.

Unlike some of the other guidelines, this is certainly something you’d only want to do in code you know for sure is performance-critical. Maintaining unrolled code is tedious and error-prone.

I’ve had the most success with this approach by not by unrolling these loops myself, but by using a macro, or similar, to generate the unrolled form.

(defmacro with-detect (var list)
  (cl-loop for e in list
           collect `(and (= 0 (% ,var ,e)) ,e) into conditions
           finally return `(or ,@conditions)))

(defun detect-unrolled (x)
  (with-detect x (2 3 5 7 11 13 17 19 23 29 31)))

How can I find more optimization opportunities myself?

Use M-x disassemble to inspect the byte-code for your own hot spots. Observe how the byte-code changes in response to changes in your functions. Take note of the sorts of forms that allow the byte-code compiler to produce the best code, and then exploit it where you can.

Domain-Specific Language Compilation in Elfeed

2016-12-27T21:46:30Z

Last night I pushed another performance enhancement for Elfeed, this time reducing the time spent parsing feeds. It’s accomplished by compiling, during macro expansion, a jQuery-like domain-specific language within Elfeed.

Heuristic parsing

Given the nature of the domain — an under-specified standard and a lack of robust adherence — feed parsing is much more heuristic than strict. Sure, everyone’s feed XML is strictly conforming since virtually no feed reader tolerates invalid XML (thank you, XML libraries), but, for the schema, the situation resembles the de facto looseness of HTML. Sometimes important or required information is missing, or is only available in a different namespace. Sometimes, especially in the case of timestamps, it’s in the wrong format, or encoded incorrectly, or ambiguous. It’s real world data.

To get a particular piece of information, Elfeed looks in a number of different places within the feed, starting with the preferred source and stopping when the information is found. For example, to find the date of an Atom entry, Elfeed first searches for elements in this order:

Failing to find any of these elements, or if no parsable date is found, it settles on the current time. Only the updated element is required, but published usually has the desired information, so it goes first. The last three are only valid for another namespace, but are useful fallbacks.

Before Elfeed even starts this search, the XML text is parsed into an s-expression using xml-parse-region — a pure Elisp XML parser included in Emacs. The search is made over the resulting s-expression.

For example, here’s a sample from the Atom specification.

 xmlns="http://www.w3.org/2005/Atom">

  </span>Example Feed<span class="nt">
   href="http://example.org/"/>
  2003-12-13T18:30:02Z
  
    John Doe
  
  urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6

  
    </span>Atom-Powered Robots Run Amok<span class="nt">
     rel="alternate" href="http://example.org/2003/12/13/atom03"/>
    urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a
    2003-12-13T18:30:02Z
    Some text.
  

Which is parsed to into this s-expression.

((feed ((xmlns . "http://www.w3.org/2005/Atom"))
       (title () "Example Feed")
       (link ((href . "http://example.org/")))
       (updated () "2003-12-13T18:30:02Z")
       (author () (name () "John Doe"))
       (id () "urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6")
       (entry ()
              (title () "Atom-Powered Robots Run Amok")
              (link ((rel . "alternate")
                     (href . "http://example.org/2003/12/13/atom03")))
              (id () "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a")
              (updated () "2003-12-13T18:30:02Z")
              (summary () "Some text."))))

Each XML element is converted to a list. The first item is a symbol that is the element’s name. The second item is an alist of attributes — cons pairs of symbols and strings. And the rest are its children, both string nodes and other elements. I’ve trimmed the extraneous string nodes from the sample s-expression.

A subtle detail is that xml-parse-region doesn’t just return the root element. It returns a list of elements, which always happens to be a single element list, which is the root element. I don’t know why this is, but I’ve built everything to assume this structure as input.

Elfeed strips all namespaces stripped from both elements and attributes to make parsing simpler. As I said, it’s heuristic rather than strict, so namespaces are treated as noise.

A domain-specific language

Coding up Elfeed’s s-expression searches in straight Emacs Lisp would be tedious, error-prone, and difficult to understand. It’s a lot of loops, assoc, etc. So instead I invented a jQuery-like, CSS selector-like, domain-specific language (DSL) to express these searches concisely and clearly.

For example, all of the entry links are “selected” using this expression:

(feed entry link [rel "alternate"] :href)

Reading right-to-left, this matches every href attribute under every link element with the rel="alternate" attribute, under every entry element, under the feed root element. Symbols match element names, two-element vectors match elements with a particular attribute pair, and keywords (which must come last) narrow the selection to a specific attribute value.

Imagine hand-writing the code to navigate all these conditions for each piece of information that Elfeed requires. The RSS parser makes up to 16 such queries, and the Atom parser makes as many as 24. That would add up to a lot of tedious code.

The package (included with Elfeed) that executes this query is called “xml-query.” It comes in two flavors: xml-query and xml-query-all. The former returns just the first match, and the latter returns all matches. The naming parallels the querySelector() and querySelectorAll() DOM methods in JavaScript.

(let ((xml (elfeed-xml-parse-region)))
  (xml-query-all '(feed entry link [rel "alternate"] :href) xml))

;; => ("http://example.org/2003/12/13/atom03")

That date search I mentioned before looks roughly like this. The * matches text nodes within the selected element. It must come last just like the keyword matcher.

(or (xml-query '(feed entry published *))
    (xml-query '(feed entry updated *))
    (xml-query '(feed entry date *))
    (xml-query '(feed entry modified *))
    (xml-query '(feed entry issued *))
    (current-time))

Over the past three years, Elfeed has gained more and more of these selectors as it collects more and more information from feeds. Most recently, Elfeed collects author and category information provided by feeds. Each new query slows feed parsing a little bit, and it’s a perfect example of a program slowing down as it gains more features and capabilities.

But I don’t want Elfeed to slow down. I want it to get faster!

Optimizing the domain-specific language

Just like the primary jQuery function ($), both xml-query and xml-query-all are functions. The xml-query engine processes the selector from scratch on each invocation. It examines the first element, dispatches on its type/value to apply it to the input, and then recurses on the rest of selector with the narrowed input, stopping when it hits the end of the list. That’s the way it’s worked from the start.

However, every selector argument in Elfeed is a static, quoted list. Unlike user-supplied filters, I know exactly what I want to execute ahead of time. It would be much better if the engine didn’t have to waste time reparsing the DSL for each query.

This is the classic split between interpreters and compilers. An interpreter reads input and immediately executes it, doing what the input tells it to do. A compiler reads input and, rather than execute it, produces output, usually in a simpler language, that, when evaluated, has the same effect as executing the input.

Rather than interpret the selector, it would be better to compile it into Elisp code, compile that into byte-code, and then have the Emacs byte-code virtual machine (VM) execute the query each time it’s needed. The extra work of parsing the DSL is performed ahead of time, the dispatch is entirely static, and the selector ultimately executes on a much faster engine (byte-code VM). This should be a lot faster!

So I wrote a function that accepts a selector expression and emits Elisp source that implements that selector: a compiler for my DSL. Having a readily-available syntax tree is one of the big advantages of homoiconicity, and this sort of function makes perfect sense in a lisp. For the external interface, this compiler function is called by a new pair of macros, xml-query* and xml-query-all*. These macros consume a static selector and expand into the compiled Elisp form of the selector.

To demonstrate, remember that link query from before? Here’s the macro version of that selection, but only returning the first match. Notice the selector is no longer quoted. This is because it’s consumed by the macro, not evaluated.

(xml-query* (feed entry title [rel "alternate"] :href) xml)

This will expand into the following code.

(catch 'done
  (dolist (v xml)
    (when (and (consp v) (eq (car v) 'feed))
      (dolist (v (cddr v))
        (when (and (consp v) (eq (car v) 'entry))
          (dolist (v (cddr v))
            (when (and (consp v) (eq (car v) 'title))
              (let ((value (cdr (assq 'rel (cadr v)))))
                (when (equal value "alternate")
                  (let ((v (cdr (assq 'href (cadr v)))))
                    (when v
                      (throw 'done v))))))))))))

As soon as it finds a match, it’s thrown to the top level and returned. Without the DSL, the expansion is essentially what would have to be written by hand. This is exactly the sort of leverage you should be getting from a compiler. It compiles to around 130 byte-code instructions.

The xml-query-all* form is nearly the same, but instead of a throw, it pushes the result into the return list. Only the prologue (the outermost part) and the epilogue (the innermost part) are different.

Parsing feeds is a hot spot for Elfeed, so I wanted the compiler’s output to be as efficient as possible. I had three goals for this:

No extraneous code. It’s easy for the compiler to emit unnecessary code. The byte-code compiler might be able to eliminate some of it, but I don’t want to rely on that. Except for the identifiers, it should basically look like a human wrote it.
Avoid function calls. I don’t want to pay function call overhead, and, with some care, it’s easy to avoid. In the xml-query* expansion, the only function call is throw, which is unavoidable. The xml-query-all* version makes no function calls whatsoever. Notice that I used assq rather than assoc. First, it only needs to match symbols, so it should be faster. Second, assq has its own byte-code instruction (158) and assoc does not.
No unnecessary memory allocations. The xml-query* expansion makes no allocations. The xml-query-all* version only conses once per output, which is the minimum possible.

The end result is at least as optimal as hand-written code, but without the chance of human error (typos, fat fingering) and sourced from an easy-to-read DSL.

Performance

In my tests, the xml-query macros are a full order of magnitude faster than the functions. Yes, ten times faster! It’s an even bigger gain than I expected.

In the full picture, xml-query is only one part of parsing a feed. Measuring the time starting from raw XML text (as delivered by cURL) to a list of database entry objects, I’m seeing an overall 25% speedup with the macros. The remaining time is dominated by xml-parse-region, which is mostly out of my control.

With xml-query so computationally cheap, I don’t need to worry about using it more often. Compared to parsing XML text, it’s virtually free.

When it came time to validate my DSL compiler, I was really happy that Elfeed had a test suite. I essentially rewrote a core component from scratch, and passing all of the unit tests was a strong sign that it was correct. Many times that test suite has provided confidence in changes made both by me and by others.

I’ll end by describing another possible application: Apply this technique to regular expressions, such that static strings containing regular expressions are compiled into Elisp/byte-code via macro expansion. I wonder if situationally this would be faster than Emacs’ own regular expression engine.

Some Performance Advantages of Lexical Scope

2016-12-22T02:33:36Z

I recently had a discussion with Xah Lee about lexical scope in Emacs Lisp. The topic was why lexical-binding exists at a file-level when there was already lexical-let (from cl-lib), prompted by my previous article on JIT byte-code compilation. The specific context is Emacs Lisp, but these concepts apply to language design in general.

Until Emacs 24.1 (June 2012), Elisp only had dynamically scoped variables — a feature, mostly by accident, common to old lisp dialects. While dynamic scope has some selective uses, it’s widely regarded as a mistake for local variables, and virtually no other languages have adopted it.

Way back in 1993, Dave Gillespie’s deviously clever lexical-let macro was committed to the cl package, providing a rudimentary form of opt-in lexical scope. The macro walks its body replacing local variable names with guaranteed-unique gensym names: the exact same technique used in macros to create “hygienic” bindings that aren’t visible to the macro body. It essentially “fakes” lexical scope within Elisp’s dynamic scope by preventing variable name collisions.

For example, here’s one of the consequences of dynamic scope.

(defun inner ()
  (setq v :inner))

(defun outer ()
  (let ((v :outer))
    (inner)
    v))

(outer)
;; => :inner

The “local” variable v in outer is visible to its callee, inner, which can access and manipulate it. The meaning of the free variable v in inner depends entirely on the run-time call stack. It might be a global variable, or it might be a local variable for a caller, direct or indirect.

Using lexical-let deconflicts these names, giving the effect of lexical scope.

(defvar v)

(defun lexical-outer ()
  (lexical-let ((v :outer))
    (inner)
    v))

(lexical-outer)
;; => :outer

But there’s more to lexical scope than this. Closures only make sense in the context of lexical scope, and the most useful feature of lexical-let is that lambda expressions evaluate to closures. The macro implements this using a technique called closure conversion. Additional parameters are added to the original lambda function, one for each lexical variable (and not just each closed-over variable), and the whole thing is wrapped in another lambda function that invokes the original lambda function with the additional parameters filled with the closed-over variables — yes, the variables (e.g. symbols) themselves, not just their values, (e.g. pass-by-reference). The last point means different closures can properly close over the same variables, and they can bind new values.

To roughly illustrate how this works, the first lambda expression below, which closes over the lexical variables x and y, would be converted into the latter by lexical-let. The #: is Elisp’s syntax for uninterned variables. So #:x is a symbol x, but not the symbol x (see print-gensym).

;; Before conversion:
(lambda ()
  (+ x y))

;; After conversion:
(lambda (&rest args)
  (apply (lambda (x y)
           (+ (symbol-value x)
              (symbol-value y)))
         '#:x '#:y args))

I’ve said on multiple occasions that lexical-binding: t has significant advantages, both in performance and static analysis, and so it should be used for all future Elisp code. The only reason it’s not the default is because it breaks some old (badly written) code. However, lexical-let doesn’t realize any of these advantages! In fact, it has worse performance than straightforward dynamic scope with let.

New symbol objects are allocated and initialized (make-symbol) on each run-time evaluation, one per lexical variable.
Since it’s just faking it, lexical-let still uses dynamic bindings, which are more expensive than lexical bindings. It varies depending on the C compiler that built Emacs, but dynamic variable accesses (opcode varref) take around 30% longer than lexical variable accesses (opcode stack-ref). Assignment is far worse, where dynamic variable assignment (varset) takes 650% longer than lexical variable assignment (stack-set). How I measured all this is a topic for another article.
The “lexical” variables are accessed using symbol-value, a full function call, so they’re even slower than normal dynamic variables.
Because converted lambda expressions are constructed dynamically at run-time within the body of lexical-let, the resulting closure is only partially byte-compiled even if the code as a whole has been byte-compiled. In contrast, lexical-binding: t closures are fully compiled. How this works is worth its own article.
Converted lambda expressions include the additional internal function invocation, making them slower.

While lexical-let is clever, and occasionally useful prior to Emacs 24, it may come at a hefty performance cost if evaluated frequently. There’s no reason to use it anymore.

Constraints on code generation

Another reason to be weary of dynamic scope is that it puts needless constraints on the compiler, preventing a number of important optimization opportunities. For example, consider the following function, bar:

(defun bar ()
  (let ((x 1)
        (y 2))
    (foo)
    (+ x y)))

Byte-compile this function under dynamic scope (lexical-binding: nil) and disassemble it to see what it looks like.

(byte-compile #'bar)
(disassemble #'bar)

That pops up a buffer with the disassembly listing:

     constant  1
     constant  2
     varbind   y
     varbind   x
     constant  foo
     call      0
     discard
     varref    x
     varref    y
     plus
    unbind    2
    return

It’s 12 instructions, 5 of which deal with dynamic bindings. The byte-compiler doesn’t always produce optimal byte-code, but this just so happens to be nearly optimal byte-code. The discard (a very fast instruction) isn’t necessary, but otherwise no more compiler smarts can improve on this. Since the variables x and y are visible to foo, they must be bound before the call and loaded after the call. While generally this function will return 3, the compiler cannot assume so since it ultimately depends on the behavior foo. Its hands are tied.

Compare this to the lexical scope version (lexical-binding: t):

     constant  1
     constant  2
     constant  foo
     call      0
     discard
     stack-ref 1
     stack-ref 1
     plus
     return

It’s only 8 instructions, none of which are expensive dynamic variable instructions. And this isn’t even close to the optimal byte-code. In fact, as of Emacs 25.1 the byte-compiler often doesn’t produce the optimal byte-code for lexical scope code and still needs some work. Despite not firing on all cylinders, lexical scope still manages to beat dynamic scope in performance benchmarks.

Here’s the optimal byte-code, should the byte-compiler become smarter someday:

     constant  foo
     call      0
     constant  3
     return

It’s down to 4 instructions due to computing the math operation at compile time. Emacs’ byte-compiler only has rudimentary constant folding, so it doesn’t notice that x and y are constants and misses this optimization. I speculate this is due to its roots compiling under dynamic scope. Since x and y are no longer exposed to foo, the compiler has the opportunity to optimize them out of existence. I haven’t measured it, but I would expect this to be significantly faster than the dynamic scope version of this function.

Optional dynamic scope

You might be thinking, “What if I really do want x and y to be dynamically bound for foo?” This is often useful. Many of Emacs’ own functions are designed to have certain variables dynamically bound around them. For example, the print family of functions use the global variable standard-output to determine where to send output by default.

(let ((standard-output (current-buffer)))
  (princ "value = ")
  (prin1 value))

Have no fear: With lexical-binding: t you can have your cake and eat it too. Variables declared with defvar, defconst, or defvaralias are marked as “special” with an internal bit flag (declared_special in C). When the compiler detects one of these variables (special-variable-p), it uses a classical dynamic binding.

Declaring both x and y as special restores the original semantics, reverting bar back to its old byte-code definition (next time it’s compiled, that is). But it would be poor form to mark x or y as special: You’d de-optimize all code (compiled after the declaration) anywhere in Emacs that uses these names. As a package author, only do this with the namespace-prefixed variables that belong to you.

The only way to unmark a special variable is with the undocumented function internal-make-var-non-special. I expected makunbound to do this, but as of Emacs 25.1 it does not. This could possibly be considered a bug.

Accidental closures

I’ve said there are absolutely no advantages to lexical-binding: nil. It’s only the default for the sake of backwards-compatibility. However, there is one case where lexical-binding: t introduces a subtle issue that would otherwise not exist. Take this code for example (and nevermind prin1-to-string for a moment):

;; -*- lexical-binding: t; -*-

(defun function-as-string ()
  (with-temp-buffer
    (prin1 (lambda () :example) (current-buffer))
    (buffer-string)))

This creates and serializes a closure, which is one of Elisp’s unique features. It doesn’t close over any variables, so it should be pretty simple. However, this function will only work correctly under lexical-binding: t when byte-compiled.

(function-as-string)
;; => "(closure ((temp-buffer . #) t) nil :example)"

The interpreter doesn’t analyze the closure, so just closes over everything. This includes the hidden variable temp-buffer created by the with-temp-buffer macro, resulting in an abstraction leak. Buffers aren’t readable, so this will signal an error if an attempt is made to read this function back into an s-expression. The byte-compiler fixes this by noticing temp-buffer isn’t actually closed over and so doesn’t include it in the closure, making it work correctly.

Under lexical-binding: nil it works correctly either way:

(function-as-string)
;; -> "(lambda nil :example)"

This may seem contrived — it’s certainly unlikely — but it has come up in practice. Still, it’s no reason to avoid lexical-binding: t.

Use lexical scope in all new code

As I’ve said again and again, always use lexical-binding: t. Use dynamic variables judiciously. And lexical-let is no replacement. It has virtually none of the benefits, performs worse, and it only applies to let, not any of the other places bindings are created: function parameters, dotimes, dolist, and condition-case.

Faster Elfeed Search Through JIT Byte-code Compilation

2016-12-11T23:16:42Z

Today I pushed an update for Elfeed that doubles the speed of the search filter in the worse case. This is the user-entered expression that dynamically narrows the entry listing to a subset that meets certain criteria: published after a particular date, with/without particular tags, and matching/non-matching zero or more regular expressions. The filter is live, applied to the database as the expression is edited, so it’s important for usability that this search completes under a threshold that the user might notice.

The typical workaround for these kinds of interfaces is to make filtering/searching asynchronous. It’s possible to do this well, but it’s usually a terrible, broken design. If the user acts upon the asynchronous results — say, by typing the query and hitting enter to choose the current or expected top result — then the final behavior is non-deterministic, a race between the user’s typing speed and the asynchronous search. Elfeed will keep its synchronous live search.

For anyone not familiar with Elfeed, here’s a filter that finds all entries from within the past year tagged “youtube” (+youtube) that mention Linux or Linus (linu[sx]), but aren’t tagged “bsd” (-bsd), limited to the most recent 15 entries (#15):

@1-year-old +youtube linu[xs] -bsd #15

The database is primarily indexed over publication date, so filters on publication dates are the most efficient filters. Entries are visited in order starting with the most recently published, and the search can bail out early once it crosses the filter threshold. Time-oriented filters have been encouraged as the solution to keep the live search feeling lively.

Filtering Overview

The first step in filtering is parsing the filter text entered by the user. This string is broken into its components using the elfeed-search-parse-filter function. Date filter components are converted into a unix epoch interval, tags are interned into symbols, regular expressions are gathered up as strings, and the entry limit is parsed into a plain integer. Absence of a filter component is indicated by nil.

(elfeed-search-parse-filter "@1-year-old +youtube linu[xs] -bsd #15")
;; => (31557600.0 (youtube) (bsd) ("linu[xs]") nil 15)

Previously, the next step was to apply the elfeed-search-filter function with this structured filter representation to the database. Except for special early-bailout situations, it works left-to-right across the filter, checking each condition against each entry. This is analogous to an interpreter, with the filter being a program.

Thinking about it that way, what if the filter was instead compiled into an Emacs byte-code function and executed directly by the Emacs virtual machine? That’s what this latest update does.

Benchmarks

With six different filter components, the actual filtering routine is a bit too complicated for an article, so I’ll set up a simpler, but roughly equivalent, scenario. With a reasonable cut-off date, the filter was already sufficiently fast, so for benchmarking I’ll focus on the worst case: no early bailout opportunities. An entry will be just a list of tags (symbols), and the filter will have to test every entry.

My real-world Elfeed database currently has 46,772 entries with 36 distinct tags. For my benchmark I’ll round this up to a nice 100,000 entries, and use 26 distinct tags (A–Z), which has the nice alphabet property and more closely reflects the number of tags I still care about.

First, here’s make-random-entry to generate a random list of 1–5 tags (i.e. an entry). The state parameter is the random state, allowing for deterministic benchmarks on a randomly-generated database.

(cl-defun make-random-entry (&key state (min 1) (max 5))
  (cl-loop repeat (+ min (cl-random (1+ (- max min)) state))
           for letter = (+ ?A (cl-random 26 state))
           collect (intern (format "%c" letter))))

The database is just a big list of entries. In Elfeed this is actually an AVL tree. Without dates, the order doesn’t matter.

(cl-defun make-random-database (&key state (count 100000))
  (cl-loop repeat count collect (make-random-entry :state state)))

Here’s my old time macro. An important change I’ve made since years ago is to call garbage-collect before starting the clock, eliminating bad samples from unlucky garbage collection events. Depending on what you want to measure, it may even be worth disabling garbage collection during the measurement by setting gc-cons-threshold to a high value.

(defmacro measure-time (&rest body)
  (declare (indent defun))
  (garbage-collect)
  (let ((start (make-symbol "start")))
    `(let ((,start (float-time)))
       ,@body
       (- (float-time) ,start))))

Finally, the benchmark harness. It uses a hard-coded seed to generate the same pseudo-random database. The test is run against the a filter function, f, 100 times in search for the same 6 tags, and the timing results are averaged.

(cl-defun benchmark (f &optional (n 100) (tags '(A B C D E F)))
  (let* ((state (copy-sequence [cl-random-state-tag -1 30 267466518]))
         (db (make-random-database :state state)))
    (cl-loop repeat n
             sum (measure-time
                   (funcall f db tags))
             into total
             finally return (/ total (float n)))))

The baseline will be memq (test for membership using identity, eq). There are two lists of tags to compare: the list that is the entry, and the list from the filter. This requires a nested loop for each entry, one explicit (cl-loop) and one implicit (memq), both with early bailout.

(defun memq-count (db tags)
  (cl-loop for entry in db count
           (cl-loop for tag in tags
                    when (memq tag entry)
                    return t)))

Byte-code compiling everything and running the benchmark on my laptop I get:

(benchmark #'memq-count)
;; => 0.041 seconds

That’s actually not too bad. One of the advantages of this definition is that there are no function calls. The memq built-in function has its own opcode (62), and the rest of the definition is special forms and macros expanding to special forms (cl-loop). It’s exactly the thing I need to exploit to make filters faster.

As a sanity check, what would happen if I used member instead of memq? In theory it should be slower because it uses equal for tests instead of eq.

(defun member-count (db tags)
  (cl-loop for entry in db count
           (cl-loop for tag in tags
                    when (member tag entry)
                    return t)))

It’s only slightly slower because member, like many other built-ins, also has an opcode (157). It’s just a tiny bit more overhead.

(benchmark #'member-count)
;; => 0.047 seconds

To test function call overhead while still using the built-in (e.g. written in C) memq, I’ll alias it so that the byte-code compiler is forced to emit a function call.

(defalias 'memq-alias 'memq)

(defun memq-alias-count (db tags)
  (cl-loop for entry in db count
           (cl-loop for tag in tags
                    when (memq-alias tag entry)
                    return t)))

To verify that this is doing what I expect, I M-x disassemble the function and inspect the byte-code disassembly. Here’s a simple example.

(disassemble
 (byte-compile (lambda (list) (memq :foo list))))

When compiled under lexical scope (lexical-binding is true), here’s the disassembly. To understand what this means, see Emacs Byte-code Internals.

     constant  :foo
     stack-ref 1
     memq
     return

Notice the memq instruction. Try using memq-alias instead:

(disassemble
 (byte-compile (lambda (list) (memq-alias :foo list))))

Resulting in a function call:

     constant  memq-alias
     constant  :foo
     stack-ref 2
     call      2
     return

And the benchmark:

(benchmark #'memq-alias-count)
;; => 0.052 seconds

So the function call adds about 27% overhead. This means it would be a good idea to avoid calling functions in the filter if I can help it. I should rely on these special opcodes.

Suppose memq was written in Emacs Lisp rather than C. How much would that hurt performance? My version of my-memq below isn’t quite the same since it returns t rather than the sublist, but it’s good enough for this purpose. (I’m using cl-loop because writing early bailout in plain Elisp without recursion is, in my opinion, ugly.)

(defun my-memq (needle haystack)
  (cl-loop for element in haystack
           when (eq needle element)
           return t))

(defun my-memq-count (db tags)
  (cl-loop for entry in db count
           (cl-loop for tag in tags
                    when (my-memq tag entry)
                    return t)))

And the benchmark:

(benchmark #'my-memq-count)
;; => 0.137 seconds

Oof! It’s more than 3 times slower than the opcode. This means I should use built-ins as much as possible in the filter.

Dynamic vs. lexical scope

There’s one last thing to watch out for. Everything so far has been compiled with lexical scope. You should really turn this on by default for all new code that you write. It has three important advantages:

It allows the compiler to catch more mistakes.
It eliminates a class of bugs related to dynamic scope: Local variables are exposed to manipulation by callees.
Lexical scope has better performance.

Here are all the benchmarks with the default dynamic scope:

(benchmark #'memq-count)
;; => 0.065 seconds

(benchmark #'member-count)
;; => 0.070 seconds

(benchmark #'memq-alias-count)
;; => 0.074 seconds

(benchmark #'my-memq-count)
;; => 0.256 seconds

It halves the performance in this benchmark, and for no benefit. Under dynamic scope, local variables use the varref opcode — a global variable lookup — instead of the stack-ref opcode — a simple array index.

(defun norm (a b)
  (* (- a b) (- a b)))

Under dynamic scope, this compiles to:

     varref    a
     varref    b
     diff
     varref    a
     varref    b
     diff
     mult
     return

And under lexical scope (notice the variable names disappear):

     stack-ref 1
     stack-ref 1
     diff
     stack-ref 2
     stack-ref 2
     diff
     mult
     return

JIT-compiled filters

So far I’ve been moving in the wrong direction, making things slower rather than faster. How can I make it faster than the straight memq version? By compiling the filter into byte-code.

I won’t write the byte-code directly, but instead generate Elisp code and use the byte-code compiler on it. This is safer, will work correctly in future versions of Emacs, and leverages the optimizations performed by the byte-compiler. This sort of thing recently got a bad rap on Emacs Horrors, but I was happy to see that this technique is already established.

(defun jit-count (db tags)
  (let* ((memq-list (cl-loop for tag in tags
                             collect `(memq ',tag entry)))
         (function `(lambda (db)
                      (cl-loop for entry in db
                               count (or ,@memq-list))))
         (compiled (byte-compile function)))
    (funcall compiled db)))

It dynamically builds the code as an s-expression, runs that through the byte-code compiler, executes it, and throws it away. It’s “just-in-time,” though compiling to byte-code and not native code. For the benchmark tags of (A B C D E F), this builds the following:

(lambda (db)
  (cl-loop for entry in db
           count (or (memq 'A entry)
                     (memq 'B entry)
                     (memq 'C entry)
                     (memq 'D entry)
                     (memq 'E entry)
                     (memq 'F entry))))

Due to its short-circuiting behavior, or is a special form, so this function is just special forms and memq in its opcode form. It’s as fast as Elisp can get.

Having s-expressions is a real strength for lisp, since the alternative (in, say, JavaScript) would be to assemble the function by concatenating code strings. By contrast, this looks a lot like a regular lisp macro. Invoking the byte-code compiler does add some overhead compared to the interpreted filter, but it’s insignificant.

How much faster is this?

(benchmark #'jit-count)
;; => 0.017s

It’s more than twice as fast! The big gain here is through loop unrolling. The outer loop has been unrolled into the or expression. That section of byte-code looks like this:

     constant  A
     stack-ref 1
     memq
     goto-if-not-nil-else-pop 1
     constant  B
     stack-ref 1
     memq
     goto-if-not-nil-else-pop 1
    constant  C
    stack-ref 1
    memq
    goto-if-not-nil-else-pop 1
    constant  D
    stack-ref 1
    memq
    goto-if-not-nil-else-pop 1
    constant  E
    stack-ref 1
    memq
    goto-if-not-nil-else-pop 1
    constant  F
    stack-ref 1
    memq
1    return

In Elfeed, not only does it unroll these loops, it completely eliminates the overhead for unused filter components. Comparing to this benchmark, I’m seeing roughly matching gains in Elfeed’s worst case. In Elfeed, I also bind lexical-binding around the byte-compile call to force lexical scope, since otherwise it just uses the buffer-local value (usually nil).

Filter compilation can be toggled on and off by setting elfeed-search-compile-filter. If you’re up to date, try out live filters with it both enabled and disabled. See if you can notice the difference.

Result summary

Here are the results in a table, all run with Emacs 24.4 on x86-64.

(ms)      memq      member    memq-alias my-memq   jit
lexical   41        47        52         137       17
dynamic   65        70        74         256       21

And the same benchmarks on Aarch64 (Emacs 24.5, ARM Cortex-A53), where I also occasionally use Elfeed, and where I have been very interested in improving performance.

(ms)      memq      member    memq-alias my-memq   jit
lexical   170       235       242        614       79
dynamic   274       340       345        1130      92

And here’s how you can run the benchmarks for yourself, perhaps with different parameters:

jit-bench.el

The header explains how to run the benchmark in batch mode:

$ emacs -Q -batch -f batch-byte-compile jit-bench.el
$ emacs -Q -batch -l jit-bench.elc -f benchmark-batch

A Showerthoughts Fortune File

2016-12-01T23:58:15Z

I have created a fortune file for the all-time top 10,000 /r/Showerthoughts posts, as of October 2016. As a word of warning: Many of these entries are adult humor and may not be appropriate for your work computer. These fortunes would be categorized as “offensive” (fortune -o).

Download: showerthoughts (1.3 MB)

The copyright status of this file is subject to each of its thousands of authors. Since it’s not possible to contact many of these authors — some may not even still live — it’s obviously never going to be under an open source license (Creative Commons, etc.). Even more, some quotes are probably from comedians and such, rather than by the redditor who made the post. I distribute it only for fun.

Installation

To install this into your fortune database, first process it with strfile to create a random-access index, showerthoughts.dat, then copy them to the directory with the rest.

$ strfile showerthoughts
"showerthoughts.dat" created
There were 10000 strings
Longest string: 343 bytes
Shortest string: 39 bytes

$ cp showerthoughts* /usr/share/games/fortunes/

Alternatively, fortune can be told to use this file directly:

$ fortune showerthoughts
Not once in my life have I stepped into somebody's house and
thought, "I sure hope I get an apology for 'the mess'."
        ―AndItsDeepToo, Aug 2016

If you didn’t already know, fortune is an old unix utility that displays a random quotation from a quotation database — a digital fortune cookie. I use it as an interactive login shell greeting on my ODROID-C2 server:

if shopt -q login_shell; then
    fortune ~/.fortunes
fi

How was it made?

Fortunately I didn’t have to do something crazy like scrape reddit for weeks on end. Instead, I downloaded the pushshift.io submission archives, which is currently around 70 GB compressed. Each file contains one month’s worth of JSON data, one object per submission, one submission per line, all compressed with bzip2.

Unlike so many other datasets, especially when it’s made up of arbitrary inputs from millions of people, the format of the /r/Showerthoughts posts is surprisingly very clean and requires virtually no touching up. It’s some really fantastic data.

A nice feature of bzip2 is concatenating compressed files also concatenates the uncompressed files. Additionally, it’s easy to parallelize bzip2 compression and decompression, which gives it an edge over xz. I strongly recommend using lbzip2 to decompress this data, should you want to process it yourself.

cat RS_*.bz2 | lbunzip2 > everything.json

jq is my favorite command line tool for processing JSON (and rendering fractals). To filter all the /r/Showerthoughts posts, it’s a simple select expression. Just mind the capitalization of the subreddit’s name. The -c tells jq to keep it one per line.

cat RS_*.bz2 | \
    lbunzip2 | \
    jq -c 'select(.subreddit == "Showerthoughts")' \
    > showerthoughts.json

However, you’ll quickly find that jq is the bottleneck, parsing all that JSON. Your cores won’t be exploited by lbzip2 as they should. So I throw grep in front to dramatically decrease the workload for jq.

cat *.bz2 | \
    lbunzip2 | \
    grep -a Showerthoughts | \
    jq -c 'select(.subreddit == "Showerthoughts")'
    > showerthoughts.json

This will let some extra things through, but it’s a superset. The -a option is necessary because the data contains some null bytes. Without it, grep switches into binary mode and breaks everything. This is incredibly frustrating when you’ve already waited half an hour for results.

To further reduce the workload further down the pipeline, I take advantage of the fact that only four fields will be needed: title, score, author, and created_utc. The rest can — and should, for efficiency’s sake — be thrown away where it’s cheap to do so.

cat *.bz2 | \
    lbunzip2 | \
    grep -a Showerthoughts | \
    jq -c 'select(.subreddit == "Showerthoughts") |
               {title, score, author, created_utc}' \
    > showerthoughts.json

This gathers all 1,199,499 submissions into a 185 MB JSON file (as of this writing). Most of these submissions are terrible, so the next step is narrowing it to the small set of good submissions and putting them into the fortune database format.

It turns out reddit already has a method for finding the best submissions: a voting system. Just pick the highest scoring posts. Through experimentation I arrived at 10,000 as the magic cut-off number. After this the quality really starts to drop off. Over time this should probably be scaled up with the total number of submissions.

I did both steps at the same time using a bit of Emacs Lisp, which is particularly well-suited to the task:

https://github.com/skeeto/showerthoughts

This Elisp program reads one JSON object at a time and sticks each into a AVL tree sorted by score (descending), then timestamp (ascending), then title (ascending). The AVL tree is limited to 10,000 items, with the lowest items being dropped. This was a lot faster than the more obvious approach: collecting everything into a big list, sorting it, and keeping the top 10,000 items.

Formatting

The most complicated part is actually paragraph wrapping the submissions. Most are too long for a single line, and letting the terminal hard wrap them is visually unpleasing. The submissions are encoded in UTF-8, some with characters beyond simple ASCII. Proper wrapping requires not just Unicode awareness, but also some degree of Unicode rendering. The algorithm needs to recognize grapheme clusters and know the size of the rendered text. This is not so trivial! Most paragraph wrapping tools and libraries get this wrong, some counting width by bytes, others counting width by codepoints.

Emacs’ M-x fill-paragraph knows how to do all these things — only for a monospace font, which is all I needed — and I decided to leverage it when generating the fortune file. Here’s an example that paragraph-wraps a string:

(defun string-fill-paragraph (s)
  (with-temp-buffer
    (insert s)
    (fill-paragraph)
    (buffer-string)))

For the file format, items are delimited by a % on a line by itself. I put the wrapped content, followed by a quotation dash, the author, and the date. A surprising number of these submissions have date-sensitive content (“on this day X years ago”), so I found it was important to include a date.

April Fool's Day is the one day of the year when people critically
evaluate news articles before accepting them as true.
        ―kellenbrent, Apr 2015
%
Of all the bodily functions that could be contagious, thank god
it's the yawn.
        ―MKLV, Aug 2015
%

There’s the potential that a submission itself could end with a lone % and, with a bit of bad luck, it happens to wrap that onto its own line. Fortunately this hasn’t happened yet. But, now that I’ve advertised it, someone could make such a submission, popular enough for the top 10,000, with the intent to personally trip me up in a future update. I accept this, though it’s unlikely, and it would be fairly easy to work around if it happened.

The strfile program looks for the % delimiters and fills out a table of file offsets. The header of the .dat file indicates the number strings along with some other metadata. What follows is a table of 32-bit file offsets.

struct {
    uint32_t str_version;  /* version number */
    uint32_t str_numstr;   /* # of strings in the file */
    uint32_t str_longlen;  /* length of longest string */
    uint32_t str_shortlen; /* shortest string length */
    uint32_t str_flags;    /* bit field for flags */
    char str_delim;        /* delimiting character */
}

Note that the table doesn’t necessarily need to list the strings in the same order as they appear in the original file. In fact, recent versions of strfile can sort the strings by sorting the table, all without touching the original file. Though none of this important to fortune.

Now that you know how it all works, you can build your own fortune file from your own inputs!

Emacs, Dynamic Modules, and Joysticks

2016-11-05T04:01:51Z

Two months ago Emacs 25 was released and introduced a new dynamic module feature. Emacs can now load shared libraries built against Emacs’ module API, defined in emacs-module.h. What’s interesting about this API is that it doesn’t require linking against Emacs or any sort of library. Instead, at run time Emacs supplies the module’s initialization function with function pointers for the entire API.

As a demonstration, in this article I’ll build an Emacs joystick interface (Linux only) using a dynamic module. It will allow Emacs to read events from any joystick on the system. All the source code is here:

https://github.com/skeeto/joymacs

It includes a calibration interface (M-x joydemo) within Emacs:

Currently, Emacs’ emacs-module.h header is the entirety of the module documentation. It’s a bit thin and leaves ambiguities that requires some reading of the Emacs source code. Even reading the source, it’s not clear which behaviors are a reliable part of the interface. For example, if there’s a pending non-local exit, it’s safe for a function to return NULL since the return value is never inspected (Emacs 25.1), but will this always be the case? While mistakes are unforgiving (a hard crash), the API is mostly intuitive and it’s been pretty easy to feel my way around it.

Update: Philipp Stephani has written thorough, reliable module documentation.

Dynamic Module Types

All Emacs values — integers, floats, cons cells, vectors, strings, etc. — are represented as the polymorphic, pointer-valued type, emacs_value. Despite being a pointer, NULL is not a valid value, as convenient as that would be. The API includes functions for creating and extracting the fundamental types: integers, floats, strings. Almost all other object types can only be accessed by making Lisp function calls to regular Emacs functions from the module.

Modules also introduce a brand new Emacs object type: a user pointer. These are non-readable, opaque pointer values returned by modules, typically representing a handle to some resource, be it a memory block, database connection, or a joystick. These objects include a finalizer function pointer — which, surprisingly, is not permitted to be NULL — and their lifetime is managed by Emacs’ garbage collector.

User pointers are a somewhat dangerous feature since there’s little to stop Emacs Lisp code from misusing them. A Lisp program can take a user pointer from one module and pass it to a function in a different module. Since it’s just a pointer, there’s no way to type check it. At best, a module could maintain a table of all its live pointers, checking all user pointer arguments against the table before dereferencing. But I don’t expect this to be normal practice.

Module Initialization

After loading the module through the platform’s mechanism, the first thing Emacs does is check for the symbol plugin_is_GPL_compatible. While tacky, this is not surprising given the culture around Emacs.

Next it calls emacs_module_init(), passing it the first function pointer. From this, the module can get a Lisp environment and start doing Emacs things, such as binding module functions to Lisp symbols.

Here’s a complete “Hello, world!” example:

#include "emacs-module.h"

int plugin_is_GPL_compatible;

int
emacs_module_init(struct emacs_runtime *ert)
{
    emacs_env *env = ert->get_environment(ert);
    emacs_value message = env->intern(env, "message");
    const char hi[] = "Hello, world!";
    emacs_value string = env->make_string(env, hi, sizeof(hi) - 1);
    env->funcall(env, message, 1, &string);
    return 0;
}

In a real module, it’s common to create function objects for native functions, then fetch the fset symbol and make a Lisp call on it to bind the newly-created function object to a name. You’ll see this in action later.

Joystick API

The joystick API will closely resemble Linux’s own joystick API, making for a fairly thin wrapper. It’s so thin that Emacs almost doesn’t even need a dynamic module. This is because, on Linux, joysticks are just files under /dev/input/. Want to see the input events on the first joystick? Just read /dev/input/js0. So Plan 9.

Emacs already knows how to read files, but these virtual files are a little too special for that. The header linux/joystick.h defines a struct js_event:

struct js_event {
    uint32_t time;  /* event timestamp in milliseconds */
    int16_t value;
    uint8_t type;
    uint8_t number; /* axis/button number */
};

The idea is to read from the joystick device into this structure. The first several reads are initialization that define the axes and buttons of the joystick and their initial state. Further events are queued up for the file descriptor. This all means that the file can’t just be opened each time joystick input is needed. It has to be held open for the duration, and is typically configured non-blocking.

The Emacs package will be called joymacs and there will be three functions:

(joymacs-open N)
(joymacs-close JOYSTICK)
(joymacs-read JOYSTICK EVENT-VECTOR)

joymacs-open

The joymacs-open function will take an integer, opening the Nth joystick (/dev/input/jsN). It will create a file descriptor for the joystick device, returning it as a user pointer. Think of it as a sort of “joystick handle.” Now, it could instead return the file descriptor as an integer, but the user pointer has two significant benefits:

The resource will be garbage collected. If the caller loses track of a file descriptor returned as an integer, the joystick device will be held open until Emacs shuts down, using up one of Emacs’ file descriptors. By putting it in a user pointer, the garbage collector will have the module to release the file descriptor if the user loses track of it.
It should be difficult for the user to make a dangerous call. Emacs Lisp can’t create user pointers — they only come from modules — and so the module is less likely to get passed the wrong thing. In the case of joystick-close, the module will be calling close(2) on the argument. We definitely don’t want to make that system call on file descriptors owned by Emacs. Further, since user pointers are mutable, the module can ensure it doesn’t call close(2) twice.

Here’s the implementation for joymacs-open. I’ll over over each part in detail.

static emacs_value
joymacs_open(emacs_env *env, ptrdiff_t n, emacs_value *args, void *ptr)
{
    (void)ptr;
    (void)n;
    int id = env->extract_integer(env, args[0]);
    if (env->non_local_exit_check(env) != emacs_funcall_exit_return)
        return nil;
    char buf[64];
    int buflen = sprintf(buf, "/dev/input/js%d", id);
    int fd = open(buf, O_RDONLY | O_NONBLOCK);
    if (fd == -1) {
        emacs_value signal = env->intern(env, "file-error");
        emacs_value message = env->make_string(env, buf, buflen);
        env->non_local_exit_signal(env, signal, message);
        return nil;
    }
    return env->make_user_ptr(env, fin_close, (void *)(intptr_t)fd);
}

The C function name doesn’t matter to Emacs. It’s static because it doesn’t even matter if the function visible to Emacs. It will get the function pointer later as part of initialization.

This is the prototype for all functions callable by Emacs Lisp, regardless of its arity. It has four arguments:

It gets an environment, env, through which to call back into Emacs.
It gets n, the number of arguments. This is guaranteed to be the correct number of arguments, as specified later when creating the function object, so only variadic functions need to inspect this argument.
The Lisp arguments are passed as an array of values, args. There’s no type declaration when declaring a function object, so these may be of the wrong type. I’ll go over how to deal with this.
Finally, it gets an arbitrary pointer, supplied at function object creation time. This allows the module to create closures, but will usually be ignored.

The first thing the function does is extract its integer argument. This is actually an intmax_t, but I don’t think anyone has that many USB ports. An int will suffice.

    int id = env->extract_integer(env, args[0]);
    if (env->non_local_exit_check(env) != emacs_funcall_exit_return)
        return nil;

As for not underestimating fools, what if the user passed a value that isn’t an integer? Will the world come crashing down? Fortunately Emacs checks that in extract_integer and, if there’s a mismatch, sets a pending error signal in the environment. This is really great because checking types directly in the module is a real pain the ass. So, before committing to anything further, such as opening a file, I check for this signal and bail out early if necessary. In Emacs 25.1 it’s safe to return NULL since the return value will be completely ignored, but I’d rather hedge my bets.

By the way, the nil here is a global variable set in initialization. You don’t just get that for free!

The next step is opening the joystick device, read-only and non-blocking. The non-blocking is vital because the module would otherwise hang Emacs later if there are no events (well, except for the read being quickly interrupted by a POSIX signal).

    char buf[64];
    int buflen = sprintf(buf, "/dev/input/js%d", id);
    int fd = open(buf, O_RDONLY | O_NONBLOCK);

If the joystick fails to open (e.g. it doesn’t exist, or the user lacks permission), manually set an error signal for a non-local exit. I chose the file-error signal and I’m just using the filename as the signal data.

    if (fd == -1) {
        emacs_value signal = env->intern(env, "file-error");
        emacs_value message = env->make_string(env, buf, buflen);
        env->non_local_exit_signal(env, signal, message);
        return nil;
    }

Otherwise create the user pointer. No need to allocate any memory; just stuff it in the pointer itself. If the user mistakenly passes it to another module, it will sure be in for a surprise when it tries to dereference it.

    return env->make_user_ptr(env, fin_close, (void *)(intptr_t)fd);

The fin_close() function is defined as:

static void
fin_close(void *fdptr)
{
    int fd = (intptr_t)fdptr;
    if (fd != -1)
        close(fd);
}

The garbage collector will call this function when the user pointer is lost. If the user closes it early with joymacs-close, that function will set the user pointer to -1, an invalid file descriptor, so that it doesn’t get closed a second time here.

joymacs-close

Here’s joymacs-close, which is a bit simpler.

static emacs_value
joymacs_close(emacs_env *env, ptrdiff_t n, emacs_value *args, void *ptr)
{
    (void)ptr;
    (void)n;
    int fd = (intptr_t)env->get_user_ptr(env, args[0]);
    if (env->non_local_exit_check(env) != emacs_funcall_exit_return)
        return nil;
    if (fd != -1) {
        close(fd);
        env->set_user_ptr(env, args[0], (void *)(intptr_t)-1);
    }
    return nil;
}

Again, it starts by extracting its argument, relying on Emacs to do the check:

    int fd = (intptr_t)env->get_user_ptr(env, args[0]);
    if (env->non_local_exit_check(env) != emacs_funcall_exit_return)
        return nil;

If the user pointer hasn’t been closed yet, then close it and strip out the file descriptor to prevent further closes.

    if (fd != -1) {
        close(fd);
        env->set_user_ptr(env, args[0], (void *)(intptr_t)-1);
    }

joymacs-read

The joymacs-read function is doing something a little unusual for an Emacs Lisp function. It takes two arguments: the joystick handle and a 5-element vector. Instead of returning the event in some representation, it fills the vector with the event details. The are two reasons for this:

The API has no function for creating vectors … though the module could get the make-symbol vector and call it to create a vector.
The idiom for event pumps is for the caller to supply a buffer to the pump. This has better performance by avoiding lots of unnecessary allocations, especially since events tend to be message-like objects with a short, well-defined extent.

Here’s the full definition:

static emacs_value
joymacs_read(emacs_env *env, ptrdiff_t n, emacs_value *args, void *ptr)
{
    (void)n;
    (void)ptr;
    int fd = (intptr_t)env->get_user_ptr(env, args[0]);
    if (env->non_local_exit_check(env) != emacs_funcall_exit_return)
        return nil;
    struct js_event e;
    int r = read(fd, &e, sizeof(e));
    if (r == -1 && errno == EAGAIN) {
        /* No more events. */
        return nil;
    } else if (r == -1) {
        /* An actual read error (joystick unplugged, etc.). */
        emacs_value signal = env->intern(env, "file-error");
        const char *error = strerror(errno);
        size_t len = strlen(error);
        emacs_value message = env->make_string(env, error, len);
        env->non_local_exit_signal(env, signal, message);
        return nil;
    } else {
        /* Fill out event vector. */
        emacs_value v = args[1];
        emacs_value type = e.type & JS_EVENT_BUTTON ? button : axis;
        emacs_value value;
        if (type == button)
            value = e.value ? t : nil;
        else
            value =  env->make_float(env, e.value / (double)INT16_MAX);
        env->vec_set(env, v, 0, env->make_integer(env, e.time));
        env->vec_set(env, v, 1, type);
        env->vec_set(env, v, 2, value);
        env->vec_set(env, v, 3, env->make_integer(env, e.number));
        env->vec_set(env, v, 4, e.type & JS_EVENT_INIT ? t : nil);
        return args[1];
    }
}

As before, extract the first argument and check for a signal. Then call read(2) to get an event. If the read fails with EAGAIN, it’s not a real failure. There are just no more events, so return nil.

    struct js_event e;
    int r = read(fd, &e, sizeof(e));
    if (r == -1 && errno == EAGAIN) {
        /* No more events. */
        return nil;
    }

If the read failed with something else — perhaps the joystick was unplugged — signal an error. The strerror(3) string is used for the signal data.

    if (r == -1) {
        /* An actual read error (joystick unplugged, etc.). */
        emacs_value signal = env->intern(env, "file-error");
        const char *error = strerror(errno);
        emacs_value message = env->make_string(env, error, strlen(error));
        env->non_local_exit_signal(env, signal, message);
        return nil;
    }

Otherwise fill out the event vector. If the second argument isn’t a vector, or if it’s too short, the signal will automatically get raised by Emacs. The module can keep plowing through the vec_set() calls safely since it’s not committing to anything.

        /* Fill out event vector. */
        emacs_value v = args[1];
        emacs_value type = e.type & JS_EVENT_BUTTON ? button : axis;
        emacs_value value;
        if (type == button)
            value = e.value ? t : nil;
        else
            value =  env->make_float(env, e.value / (double)INT16_MAX);
        env->vec_set(env, v, 0, env->make_integer(env, e.time));
        env->vec_set(env, v, 1, type);
        env->vec_set(env, v, 2, value);
        env->vec_set(env, v, 3, env->make_integer(env, e.number));
        env->vec_set(env, v, 4, e.type & JS_EVENT_INIT ? t : nil);
        return args[1];

The Linux event struct has four fields and the function fills out five values of the vector. This is because the type field has a bit flag indicating initialization events. This is split out into an extra t/nil value. It also normalizes axis values and converts button values into t/nil, which makes more sense for Emacs Lisp. The event itself is returned since it’s a truthy value and it’s convenient for the caller.

The astute programmer might notice that the negative side of the axis could go just below -1.0, since INT16_MIN has one extra value over INT16_MAX (two’s complement). It doesn’t seem to be documented, but the joystick drivers I’ve seen never exactly return INT16_MIN, so this is in fact the correct way to normalize it.

Initialization

Update 2021: In a previous version of this article, I talked about interning symbols during initialziation so that they do not need to be re-interned each time the module is called. This no longer works, and it was probably never intended to be work in the first place. The lesson is simple: Do not reuse Emacs objects between module calls.

First grab the fset symbol since this function will be needed to bind names to the module’s functions.

    emacs_value fset = env->intern(env, "fset");

Using fset, bind the functions. The second and third arguments to make_function are the minimum and maximum number of arguments, which may look familiar. The last argument is that closure pointer I mentioned at the beginning.

    emacs_value args[2];
    args[0] = env->intern(env, "joymacs-open");
    args[1] = env->make_function(env, 1, 1, joymacs_open, doc, 0);
    env->funcall(env, fset, 2, args);

If the module is to be loaded with require like any other package, it needs to provide: (provide 'joymacs).

    emacs_value provide = env->intern(env, "provide");
    emacs_value joymacs = env->intern(env, "joymacs");
    env->funcall(env, provide, 1, &joymacs);

And that’s it!

The source repository now includes a port to Windows (XInput). If you’re on Linux or Windows, have Emacs 25 with modules enabled, and a joystick is plugged in, then make run in the repository should bring up Emacs running a joystick calibration demonstration. The module can’t poke at Emacs when events are ready, so instead there’s a timer that polls the module for events.

I’d like to someday see an Emacs Lisp game well-suited for a joystick.

An Elfeed Database Analysis

2016-08-12T03:20:16Z

The end of the month marks Elfeed’s third birthday. Surprising to nobody, it’s also been three years of heavy, daily use by me. While I’ve used Elfeed concurrently on a number of different machines over this period, I’ve managed to keep an Elfeed database index with a lineage going all the way back to the initial development stages, before the announcement. It’s a large, organically-grown database that serves as a daily performance stress test. Hopefully this means I’m one of the first people to have trouble if an invisible threshold is ever exceeded.

I’m also the sort of person who gets excited when I come across an interesting dataset, and I have this gem sitting right in front of me. So a couple of days ago I pushed a new Elfeed function, elfeed-csv-export, which exports a database index into three CSV files. These are intended to serve as three tables in a SQL database, exposing the database to interesting relational queries and joins. Entry content (HTML, etc.) has always been considered volatile, so this is not exported. The export function isn’t interactive (yet?), so if you want to generate your own you’ll need to (require 'elfeed-csv) and evaluate it yourself.

All the source code for performing the analysis below on your own database can be found here:

https://github.com/skeeto/elfeed-analysis

The three exported tables are feeds, entries, and tags. Here are the corresponding columns (optional CSV header) for each:

url, title, canonical-url, author
id, feed, title, link, date
entry, feed, tag

And here’s the SQLite schema I’m using for these tables:

CREATE TABLE feeds (
    url TEXT PRIMARY KEY,
    title TEXT,
    canonical_url TEXT,
    author TEXT
);

CREATE TABLE entries (
    id TEXT NOT NULL,
    feed TEXT NOT NULL REFERENCES feeds (url),
    title TEXT,
    link TEXT NOT NULL,
    date REAL NOT NULL,
    PRIMARY KEY (id, feed)
);

CREATE TABLE tags (
    entry TEXT NOT NULL,
    feed TEXT NOT NULL,
    tag TEXT NOT NULL,
    FOREIGN KEY (entry, feed) REFERENCES entries (id, feed)
);

Web authors are notoriously awful at picking actually-unique entry IDs, even when using the smarter option, Atom. I still simply don’t trust that entry IDs are unique, so, as usual, I’ve qualified them by their source feed URL, hence the primary key on both columns in entries.

At this point I wish I had collected a lot more information. If I were to start fresh today, Elfeed’s database schema would not only fully match Atom’s schema, but also exceed it with additional logging:

When was each entry actually fetched?
How did each entry change since the last fetch?
When and for what reason did a feed fetch fail?
When did an entry stop appearing in a feed?
How long did fetching take?
How long did parsing take?
Which computer (hostname) performed the fetch?
What interesting HTTP headers were included?
Even if not kept for archival, how large was the content?

I may start tracking some of these. If I don’t, I’ll be kicking myself three years from now when I look at this again.

A look at my index

So just how big is my index? It’s 25MB uncompressed, 2.5MB compressed. I currently follow 117 feeds, but my index includes 43,821 entries from 309 feeds. These entries are marked with 53,360 tags from a set of 35 unique tags. Some of these datapoints are the result of temporarily debugging Elfeed issues and don’t represent content that I actually follow. I’m more careful these days to test in a temporary database as to avoid contamination. Some are duplicates due to feeds changing URLs over the years. Some are artifacts from old bugs. This all represents a bit of noise, but should be negligible. During my analysis I noticed some of these anomalies and took a moment to clean up obviously bogus data (weird dates, etc.), all by adjusting tags.

The first thing I wanted to know is the weekday frequency. A number of times I’ve blown entire Sundays working on Elfeed, and, as if to frustrate my testing, it’s not unusual for several hours to pass between new entries on Sundays. Is this just my perception or are Sundays really that slow?

Here’s my query. I’m using SQLite’s strftime to shift the result into my local time zone, Eastern Time. This time zone is the source, or close to the source, of a large amount of the content. This also automatically accounts for daylight savings time, which can’t be done with a simple divide and subtract.

SELECT tag,
       cast(strftime('%w', date, 'unixepoch', 'localtime') AS INT) AS day,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, day;

The most frequent tag (13,666 appearances) is “youtube”, which marks every YouTube video, and I’ll use gnuplot to visualize it. The input “file” is actually a command since gnuplot is poor at filtering data itself, especially for histograms.

plot '< grep ^youtube, weekdays.csv' using 2:3 with boxes

Wow, things do quiet down dramatically on weekends! From the glass-half-full perspective, this gives me a chance to catch up when I inevitably fall behind on these videos during the week.

The same is basically true for other types of content, including “comic” (12,465 entries) and “blog” (7,505 entries).

However, “emacs” (2,404 entries) is a different story. It doesn’t slow down on the weekend, but Emacs users sure love to talk about Emacs on Mondays. In my own index, this spike largely comes from Planet Emacsen. Initially I thought maybe this was an artifact of Planet Emacsen’s date handling — i.e. perhaps it does a big fetch on Mondays and groups up the dates — but I double checked: they pass the date directly through from the original articles.

Conclusion: Emacs users love Mondays. Or maybe they hate Mondays and talk about Emacs as an escape.

I can reuse the same query to look at different time scales. When during the day do entries appear? Adjusting the time zone here becomes a lot more important.

SELECT tag,
       cast(strftime('%H', date, 'unixepoch', 'localtime') AS INT) AS hour,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, hour;

Emacs bloggers tend to follow a nice Eastern Time sleeping schedule. (I wonder how Vim bloggers compare, since, as an Emacs user, I naturally assume Vim users’ schedules are as undisciplined as their bathing habits.) However, this also might be prolific the Irreal breaking the curve.

The YouTube channels I follow are a bit more erratic, but there’s still a big drop in the early morning and a spike in the early afternoon. It’s unclear if the timestamp published in the feed is the upload time or the publication time. This would make a difference in the result (e.g. overnight video uploads).

Do you suppose there’s a slow month?

SELECT tag,
       cast(strftime('%m', date, 'unixepoch', 'localtime') AS INT) AS day,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, day;

December is a big drop across all tags, probably for the holidays. Both “comic” and “blog” also have an interesting drop in August. For brevity, I’ll only show one. This might be partially due my not waiting until the end of this month for this analysis, since there are only 2.5 Augusts in my 3-year dataset.

Unfortunately the timestamp is the only direct numerical quantity in the data. So far I’ve been binning data points and counting to get a second numerical quantity. Everything else is text, so I’ll need to get more creative to find other interesting relationships.

So let’s have a look a the lengths of entry titles.

SELECT tag,
       length(title) AS length,
       count(*) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, length
ORDER BY length;

The shortest are the webcomics. I’ve complained about poor webcomic titles before, so this isn’t surprising. The spikes are from comics that follow a strict (uncreative) title format.

Emacs article titles follow a nice distribution. You can tell these are programmers because so many titles are exactly 32 characters long. Picking this number is such a natural instinct that we aren’t even aware of it. Or maybe all their database schemas have VARCHAR(32) title columns?

Blogs in general follow a nice distribution. The big spike is from the Dwarf Fortress development blog, which follows a strict date format.

The longest on average are YouTube videos. This is largely due to the kinds of videos I watch (“Let’s Play” videos), which tend to have long, predictable names.

And finally, here’s the most interesting-looking graph of them all.

SELECT ((date - 4*60*60) % (24*60*60)) / (60*60) AS day_time,
       length(title) AS length
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed;

This is the title length versus time of day (not binned). Each point is one of the 53,360 posts.

set style fill transparent solid 0.25 noborder
set style circle radius 0.04
plot 'length-vs-daytime.csv' using 1:2 with circles

(This is a good one to follow through to the full size image.)

Again, all Eastern Time since I’m self-centered like that. Vertical lines are authors rounding their post dates to the hour. Horizontal lines are the length spikes from above, such as the line of entries at title length 10 in the evening (Dwarf Fortress blog). There’s a the mid-day cloud of entries of various title lengths, with the shortest title cloud around mid-morning. That’s probably when many of the webcomics come up.

Additional analysis could look further at textual content, beyond simply length, in some quantitative way (n-grams? soundex?). But mostly I really need to keep track of more data!

Elfeed, cURL, and You

2016-06-16T18:22:16Z

This morning I pushed out an important update to Elfeed, my web feed reader for Emacs. The update should be available in MELPA by the time you read this. Elfeed now has support for fetching feeds using a cURL through a curl inferior process. You’ll need the program in your PATH or configured through elfeed-curl-program-name.

I’ve been using it for a couple of days now, but, while I work out the remaining kinks, it’s disabled by default. So in addition to having cURL installed, you’ll need to set elfeed-use-curl to non-nil. Sometime soon it will be enabled by default whenever cURL is available. The original url-retrieve fetcher will remain in place for time time being. However, cURL may become a requirement someday.

Fetching with a curl inferior process has some huge advantages.

It’s much faster

The most obvious change is that you should experience a huge speedup on updates and better responsiveness during updates after the first cURL run. There are important two reasons:

Asynchronous DNS and TCP: Emacs 24 and earlier performs DNS queries synchronously even for asynchronous network processes. This is being fixed on some platforms (including Linux) in Emacs 25, but now we don’t have to wait.

On Windows it’s even worse: the TCP connection is also established synchronously. This is especially bad when fetching relatively small items such as feeds, because the DNS look-up and TCP handshake dominate the overall fetch time. It essentially makes the whole process synchronous.

Conditional GET: HTTP has two mechanism to avoid transmitting information that a client has previously fetched. One is the Last-Modified header delivered by the server with the content. When querying again later, the client echos the date back like a token in the If-Modified-Since header.

The second is the “entity tag,” an arbitrary server-selected token associated with each version of the content. The server delivers it along with the content in the ETag header, and the client hands it back later in the If-None-Match header, sort of like a cookie.

This is highly valuable for feeds because, unless the feed is particularly active, most of the time the feed hasn’t been updated since the last query. This avoids sending anything other hand a handful of headers each way. In Elfeed’s case, it means it doesn’t have to parse the same XML over and over again.

Both of these being outside of cURL’s scope, Elfeed has to manage conditional GET itself. I had no control over the HTTP headers until now, so I couldn’t take advantage of it. Emacs’ url-retrieve function allows for sending custom headers through dynamically binding url-request-extra-headers, but this isn’t available when calling url-queue-retrieve since the request itself is created asynchronously.

Both the ETag and Last-Modified values are stored in the database and persist across sessions. This is the reason the full speedup isn’t realized until the second fetch. The initial cURL fetch doesn’t have these values.

Fewer bugs

As mentioned previously, Emacs has a built-in URL retrieval library called url. The central function is url-retrieve which asynchronously fetches the content at an arbitrary URL (usually HTTP) and delivers the buffer and status to a callback when it’s ready. There’s also a queue front-end for it, url-queue-retrieve which limits the number of parallel connections. Elfeed hands this function a pile of feed URLs all at once and it fetches them N at a time.

Unfortunately both these functions are incredibly buggy. It’s been a thorn in my side for years.

Here’s what the interface looks like for both:

(url-retrieve URL CALLBACK &optional CBARGS SILENT INHIBIT-COOKIES)

It takes a URL and a callback. Seeing this, the sane, unsurprising expectation is the callback will be invoked exactly once for time url-retrieve was called. In any case where the request fails, it should report it through the callback. This is not the case. The callback may be invoked any number of times, including zero.

In this example, suppose you have a webserver that will return an HTTP 404 for a requested URL. Below, I fire off 10 asynchronous requests in a row.

(defvar results ())
(dotimes (i 10)
  (url-retrieve "http://127.0.0.1:8080/404"
                (lambda (status) (push (cons i status) results))))

What would you guess is the length of results? It’s initially 0 before any requests complete and over time (a very short time) I would expect this to top out at 10. On Emacs 24, here’s the real answer:

(length results)
;; => 46

The same error is reported multiple times to the callback. At least the pattern is obvious.

(cl-count 0 results :key #'car)
;; => 9
(cl-count 1 results :key #'car)
;; => 8
(cl-count 2 results :key #'car)
;; => 7

(cl-count 9 results :key #'car)
;; => 1

Here’s another one, this time to the non-existent foo.example. The DNS query should never resolve.

(setf results ())
(dotimes (i 10)
  (url-retrieve "http://foo.example/"
                (lambda (status) (push (cons i status) results))))

What’s the length of results? This time it’s zero. Remember how DNS is synchronous? Because of this, DNS failures are reported synchronously as a signaled error. This gets a lot worse with url-queue-retrieve. Since the request is put off until later, DNS doesn’t fail until later, and you get neither a callback nor an error signal. This also puts the queue in a bad state and necessitated elfeed-unjam for manually clear it. This one should get fixed in Emacs 25 when DNS is asynchronous.

This last one assumes you don’t have anything listening on port 57432 (pulled out of nowhere) so that the connection fails.

(setf results ())
(dotimes (i 10)
  (url-retrieve "http://127.0.0.1:57432/"
                (lambda (status) (push (cons i status) results))))

On Linux, we finally get the sane result of 10. However, on Windows, it’s zero. The synchronous TCP connection will fail, signaling an error just like DNS failures. Not only is it broken, it’s broken in different ways on different platforms.

There are many more cases of callback weirdness which depend on the connection and HTTP session being in various states when thing go awry. These were just the easiest to demonstrate. By using cURL, I get to bypass this mess.

No more GnuTLS issues

At compile time, Emacs can optionally be linked against GnuTLS, giving it robust TLS support so long as the shared library is available. url-retrieve uses this for fetching HTTPS content. Unfortunately, this library is noisy and will occasionally echo non-informational messages in the minibuffer and in *Messages* that cannot be suppressed.

When not linked against GnuTLS, Emacs will instead run the GnuTLS command line program as an inferior process, just like Elfeed now does with cURL. Unfortunately this interface is very slow and frequently fails, basically preventing Elfeed from fetching HTTPS feeds. I suspect it’s in part due to an improper coding-system-for-read.

cURL handles all the TLS negotation itself, so both these problems disappear. The compile-time configuration doesn’t matter.

Windows is now supported

Emacs’ Windows networking code is so unstable, even in Emacs 25, that I couldn’t make any practical use of Elfeed on that platform. Even the Cygwin emacs-w32 version couldn’t cut it. It hard crashes Emacs every time I’ve tried to fetch feeds. Fortunately the inferior process code is a whole lot more stable, meaning fetching with cURL works great. As of today, you can now use Elfeed on Windows. The biggest obstable is getting cURL installed and configured.

Interface changes

With cURL, obviously the values of url-queue-timeout and url-queue-parallel-processes no longer have any meaning to Elfeed. If you set these for yourself, you should instead call the functions elfeed-set-timeout and elfeed-set-max-connections, which will do the appropriate thing depending on the value of elfeed-use-curl. Each also comes with a getter so you can query the current value.

The deprecated elfeed-max-connections has been removed.

Feed objects now have meta tags :etag, :last-modified, and :canonical-url. The latter can identify feeds that have been moved, though it needs a real UI.

See any bugs?

If you use Elfeed, grab the current update and give the cURL fetcher a shot. Please open a ticket if you find problems. Be sure to report your Emacs version, operating system, and cURL version.

As of this writing there’s just one thing missing compared to url-queue: connection reuse. cURL supports it, so I just need to code it up.

9 Elfeed Features You Might Not Know

2015-12-03T22:33:17Z

It’s been two years since I last wrote about Elfeed, my Atom/RSS feed reader for Emacs. I’ve used it every single day since, and I continue to maintain it with help from the community. So far 18 people besides me have contributed commits. Over the last couple of years it’s accumulated some new features, some more obvious than others.

Every time I mark a new release, I update the ChangeLog at the top of elfeed.el which lists what’s new. Since it’s easy to overlook many of the newer useful features, I thought I’d list the more important ones here.

Custom Entry Colors

You can now customize entry faces through elfeed-search-face-alist. This variable maps tags to faces. An entry inherits the face of any tag it carries. Previously “unread” was a special tag that got a bold face, but this is now implemented as nothing more than an initial entry in the alist.

I’ve been using it to mark different kinds of content (videos, podcasts, comics) with different colors.

Autotagging

You can specify the starting tags for entries from particular feeds directly in the feed listing. This has been a feature for awhile now, but it’s not something you’d want to miss. It started out as a feature in my personal configuration that eventually migrated into Elfeed proper.

For example, your elfeed-feeds may initially look like this, especially if you imported from OPML.

("https://nullprogram.com/feed/"
 "http://nedroid.com/feed/"
 "https://www.youtube.com/feeds/videos.xml?user=quill18")

If you wanted certain tags applied to entries from each, you would need to putz around with elfeed-make-tagger. For the most common case — apply certain tags to all entries from a URL — it’s much simpler to specify the information as part of the listing itself,

(("https://nullprogram.com/feed/" blog emacs)
 ("http://nedroid.com/feed/" webcomic)
 ("https://www.youtube.com/feeds/videos.xml?user=quill18" youtube))

Today I only use custom tagger functions in my own configuration to filter within a couple of particularly noisy feeds.

Arbitrary Metadata

Metadata is more for Elfeed extensions (i.e. elfeed-org) than regular users. You can attach arbitrary, readable metadata to any Elfeed object (entry, feed). This metadata is automatically stored in the database. It’s a plist.

Metadata is accessed entirely through one setf-able function: elfeed-meta. For example, you might want to track when you’ve read something, not just that you’ve read it. You could use this to selectively update certain feeds or just to evaluate your own habits.

(defun my-elfeed-mark-read (entry)
  (elfeed-untag entry 'unread)
  (let ((date (format-time-string "%FT%T%z")))
    (setf (elfeed-meta entry :read-date) date)))

Two things motivated this feature. First, without a plist, if I added more properties in the future, I would need to change the database format to support them. I modified the database format to add metadata, requiring an upgrade function to quietly upgrade older databases as they were loaded. I’d really like to avoid this in the future.

Second, I wanted to make it easy for extension authors to store their own data. I still imagine an extension someday to update feeds intelligently based on their history. For example, the database doesn’t track when the feed was last fetched, just the date of the most recent entry (if any). A smart-update extension could use metadata to tag feeds with this information.

Elfeed itself already uses two metadata keys: :failures on feeds and :title on both. :failures counts the total number of times fetching that feed resulted in an error. You could use this get a listing of troublesome feeds like so,

(cl-loop for url in (elfeed-feed-list)
         for feed = (elfeed-db-get-feed url)
         for failures = (elfeed-meta feed :failures)
         when failures
         collect (cons url failures))

The :title property allows for a custom title for both feeds and entries in the search buffer listing, assuming you’re using the default function (see below). It overrides the title provided by the feed itself. This is different than elfeed-entry-title and elfeed-feed-title, which is kept in sync with feed content. Metadata is not kept in sync with the feed itself.

Filter Inversion

You can invert filter components by prefixing them with !. For example, say you’re looking at all my posts from the past 6 months:

@6-months nullprogram.com

But say you’re tired of me and decide you want to see every entry from the past 6 months excluding my posts.

@6-months !nullprogram.com

Filter Limiter

Normally you limit the number of results by date, but you can now limit the result by count using #n. For example, to see my most recent 12 posts regardless of date,

nullprogram.com #12

This is used internally in the live filter to limit the number of results to the height of the screen. If you noticed that live filtering has been much more responsive in the last few months, this is probably why.

Bookmark Support

Elfeed properly integrates with Emacs’ bookmarks (thanks to groks). You can bookmark the current filter with M-x bookmark-set (C-x r m). By default, Emacs will persist bookmarks between sessions. To revisit a filter in the future, M-x bookmark-jump (C-x r b).

Since this requires no configuration, this may serve as an easy replacement for manually building “view” toggles — filters bound to certain keys — which I know many users have done, including me.

New Header

If you’ve updated very recently, you probably noticed Elfeed got a brand new header. Previously it faked a header by writing to the first line of the buffer. This is because somehow I had no idea Emacs had official support for buffer headers (despite notmuch using them all this time).

The new header includes additional information, such as the current filter, the number of unread entries, the total number of entries, and the number of unique feeds currently in view. You’ll see this as /: in the middle of the header.

As of this writing, the new header has not been made part of a formal release. So if you’re only tracking stable releases, you won’t see this for awhile longer.

You can supply your own header via elfeed-search-header-function (thanks to Gergely Nagy).

Scoped Updates

As you already know, in the search buffer listing you can press G to update your feeds. But did you know you it takes a prefix argument? Run as C-u G, it only updates feeds with entries currently listed in the buffer.

As of this writing, this is another feature not yet in a formal release. I’d been wanting something like this for awhile but couldn’t think of a reasonable interface. Directly prompting the user for feeds is neither elegant nor composable. However, groks suggested the prefix argument, which composes perfectly with Elfeed’s existing idioms.

Listing Customizations

In addition to custom faces, there are a number of ways to customize the listing.

Choose the sort order with elfeed-sort-order.
Set a custom date format with elfeed-search-date-format.
Adjust field widths with elfeed-search-*-width.
Or override everything with elfeed-search-print-entry-function.

Gergely Nagy has been throwing lots of commits at me over the last couple of weeks to open up lots of Elfeed’s behavior to customization, so there are more to come.

Thank You, Emacs Community

Apologies about any features I missed or anyone I forgot to mention who’s made contributions. The above comes from my ChangeLogs, the commit log, the GitHub issue listing, and my own memory, so I’m likely to have forgotten some things. A couple of these features I had forgotten about myself!

Quickly Access x86 Documentation in Emacs

2015-11-21T05:42:17Z

I recently released an Emacs package called x86-lookup. Given a mnemonic, Emacs will open up a local copy of an Intel’s software developer manual PDF at the page documenting the instruction. It complements nasm-mode, released earlier this year.

https://github.com/skeeto/x86-lookup

x86-lookup is also available from MELPA.

To use it, you’ll need Poppler’s pdftotext command line program — used to build an index of the PDF — and a copy of the complete Volume 2 of Intel’s instruction set manual. There’s only one command to worry about: M-x x86-lookup.

Minimize documentation friction

This package should be familiar to anyone who’s used javadoc-lookup, one of my older packages. It has a common underlying itch: the context switch to read API documentation while coding should have as little friction as possible, otherwise I’m discouraged from doing it. In an ideal world I wouldn’t ever need to check documentation because it’s already in my head. By visiting documentation frequently with ease, it’s going to become familiar that much faster and I’ll be reaching for it less and less, approaching the ideal.

I picked up x86 assembly [about a year ago][x86] and for the first few months I struggled to find a good online reference for the instruction set. There are little scraps here and there, but not much of substance. The big exception is Félix Cloutier’s reference, which is an amazingly well-done HTML conversion of Intel’s PDF manuals. Unfortunately I could never get it working locally to generate my own. There’s also the X86 Opcode and Instruction Reference, but it’s more for machines than humans.

Besides, I often work without an Internet connection, so offline documentation is absolutely essential. (You hear that Microsoft? Not only do I avoid coding against Win32 because it’s badly designed, but even more so because you don’t offer offline documentation anymore! The friction to API reference your documentation is enormous.)

I avoided the official x86 documentation for awhile, thinking it would be too opaque, at least until I became more accustomed to the instruction set. But really, it’s not bad! With a handle on the basics, I would encourage anyone to dive into either Intel’s or AMD’s manuals. The reason there’s not much online in HTML form is because these manuals are nearly everything you need.

I chose Intel’s manuals for x86-lookup because I’m more familiar with it, it’s more popular, it’s (slightly) easier to parse, it’s offered as a single PDF, and it’s more complete. The regular expression for finding instructions is tuned for Intel’s manual and it won’t work with AMD’s manuals.

For a couple months prior to writing x86-lookup, I had a couple of scratch functions to very roughly accomplish the same thing. The tipping point for formalizing it was that last month I wrote my own x86 assembler. A single mnemonic often has a dozen or more different opcodes depending on the instruction’s operands, and there are often several ways to encode the same operation. I was frequently looking up opcodes, and navigating the PDF quickly became a real chore. I only needed about 80 different opcodes, so I was just adding them to the assembler’s internal table manually as needed.

How does it work?

Say you want to look up the instruction RDRAND.

Initially Emacs has no idea what page this is on, so the first step is to build an index mapping mnemonics to pages. x86-lookup runs the pdftotext command line program on the PDF and loads the result into a temporary buffer.

The killer feature of pdftotext is that it emits FORM FEED (U+0012) characters between pages. Think of these as page breaks. By counting form feed characters, x86-lookup can track the page for any part of the document. In fact, Emacs is already set up to do this with its forward-page and backward-page commands. So to build the index, x86-lookup steps forward page-by-page looking for mnemonics, keeping note of the page. Since this process typically takes about 10 seconds, the index is cached in a file (see x86-lookup-cache-directory) for future use. It only needs to happen once for a particular manual on a particular computer.

The mnemonic listing is slightly incomplete, so x86-lookup expands certain mnemonics into the familiar set. For example, all the conditional jumps are listed under “Jcc,” but this is probably not what you’d expect to look up. I compared x86-lookup’s mnemonic listing against NASM/nasm-mode’s mnemonics to ensure everything was accounted for. Both packages benefited from this process.

Once the index is built, pdftotext is no longer needed. If you’re desperate and don’t have this program available, you can borrow the index file from another computer. But you’re on your own for figuring that out!

So to look up RDRAND, x86-lookup checks the index for the page number and invokes a PDF reader on that page. This is where not all PDF readers are created equal. There’s no convention for opening a PDF to a particular page and each PDF reader differs. Some don’t even support it. To deal with this, x86-lookup has a function specialized for different PDF readers. Similar to browse-url-browser-function, x86-lookup has x86-lookup-browse-pdf-function.

By default it tries to open the PDF for viewing within Emacs (did you know Emacs is a PDF viewer?), falling back to on options if the feature is unavailable. I welcome pull requests for any PDF readers not yet supported by x86-lookup. Perhaps this functionality deserves its own package.

That’s it! It’s a simple feature that has already saved me a lot of time. If you’re ever programming in x86 assembly, give x86-lookup a spin.

RSA Signatures in Emacs Lisp

2015-10-30T22:35:13Z

Emacs comes with a wonderful arbitrary-precision computer algebra system called calc. I’ve discussed it previously and continue to use it on a daily basis. That’s right, people, Emacs can do calculus. Like everything Emacs, it’s programmable and extensible from Emacs Lisp. In this article, I’m going to implement the RSA public-key cryptosystem in Emacs Lisp using calc.

If you want to dive right in first, here’s the repository:

https://github.com/skeeto/emacs-rsa

This is only a toy implementation and not really intended for serious cryptographic work. It’s also far too slow when using keys of reasonable length.

Evaluation with calc

The calc package is particularly useful when considering Emacs’ limited integer type. Emacs uses a tagged integer scheme where integers are embedded within pointers. It’s a lot faster than the alternative (individually-allocated integer objects), but it means they’re always a few bits short of the platform’s native integer type.

calc has a large API, but the user-friendly porcelain for it is the under-documented calc-eval function. It evaluates an expression string with format-like argument substitutions ($n).

(calc-eval "2^16 - 1")
;; => "65535"

(calc-eval "2^$1 - 1" nil 128)
;; => "340282366920938463463374607431768211455"

Notice it returns strings, which is one of the ways calc represents arbitrary precision numbers. For arguments, it accepts regular Elisp numbers and strings just like this function returns. The implicit radix is 10. To explicitly set the radix, prefix the number with the radix and #. This is the same as in the user interface of calc. For example:

(calc-eval "16#deadbeef")
;; => "3735928559"

The second argument (optional) to calc-eval adjusts its behavior. Given nil, it simply evaluates the string and returns the result. The manual documents the different options, but the only other relevant option for RSA is the symbol pred, which asks it to return a boolean “predicate” result.

(calc-eval "$1 < $2" 'pred "4000" "5000")
;; => t

Generating primes

RSA is founded on the difficulty of factoring large composites with large factors. Generating an RSA keypair starts with generating two prime numbers, p and q, and using these primes to compute two mathematically related composite numbers.

calc has a function calc-next-prime for finding the next prime number following any arbitrary number. It uses a probabilistic primarily test — the ~~Fermat~~ Miller-Rabin primality test — to efficiently test large integers. It increments the input until it finds a result that passes enough iterations of the primality test.

(calc-eval "nextprime($1)" nil "100000000000000000")
;; => "100000000000000003"

So to generate a random n-bit prime, first generate a random n-bit number and then increment it until a prime number is found.

;; Generate a 128-bit prime, 10 iterations (0.000084% error rate)
(calc-eval "nextprime(random(2^$1), 10)" nil 128)
"111618319598394878409654851283959105123"

Unfortunately calc’s random function is based on Emacs’ random function, which is entirely unsuitable for cryptography. In the real implementation I read n bits from /dev/urandom to generate an n-bit number.

(with-temp-buffer
  (set-buffer-multibyte nil)
  (call-process "head" "/dev/urandom" t nil "-c" (format "%d" (/ bits 8)))
  (let ((f (apply-partially #'format "%02x")))
    (concat "16#" (mapconcat f (buffer-string) ""))))

(Note: /dev/urandom is the right choice. There’s no reason to use /dev/random for generating keys.)

Computing e and d

From here the code just follows along from the Wikipedia article. After generating the primes p and q, two composites are computed, n = p * q and i = (p - 1) * (q - 1). Lacking any reason to do otherwise, I chose 65,537 for the public exponent e.

The function rsa--inverse is just a straight Emacs Lisp + calc implementation of the extended Euclidean algorithm from the Wikipedia article pseudocode, computing d ≡ e^-1 (mod i). It’s not much use sharing it here, so take a look at the repository if you’re curious.

(defun rsa-generate-keypair (bits)
  "Generate a fresh RSA keypair plist of BITS length."
  (let* ((p (rsa-generate-prime (+ 1 (/ bits 2))))
         (q (rsa-generate-prime (+ 1 (/ bits 2))))
         (n (calc-eval "$1 * $2" nil p q))
         (i (calc-eval "($1 - 1) * ($2 - 1)" nil p q))
         (e (calc-eval "2^16+1"))
         (d (rsa--inverse e i)))
    `(:public  (:n ,n :e ,e) :private (:n ,n :d ,d))))

The public key is n and e and the private key is n and d. From here we can compute and verify cryptographic signatures.

Signatures

To compute signature s of an integer m (where m < n), compute s ≡ m^d (mod n). I chose the right-to-left binary method, again straight from the Wikipedia pseudocode (lazy!). I’ll share this one since it’s short. The backslash denotes integer division.

(defun rsa--mod-pow (base exponent modulus)
  (let ((result 1))
    (setf base (calc-eval "$1 % $2" nil base modulus))
    (while (calc-eval "$1 > 0" 'pred exponent)
      (when (calc-eval "$1 % 2 == 1" 'pred exponent)
        (setf result (calc-eval "($1 * $2) % $3" nil result base modulus)))
      (setf exponent (calc-eval "$1 \\ 2" nil exponent)
            base (calc-eval "($1 * $1) % $2" nil base modulus)))
    result))

Verifying the signature is the same process, but with the public key’s e: m ≡ s^e (mod n). If the signature is valid, m will be recovered. In theory, only someone who knows d can feasibly compute s from m. If n is small enough to factor, revealing p and q, then d can be feasibly recomputed from the public key. So mind your Ps and Qs.

So that leaves one problem: generally users want to sign strings and files and such, not integers. A hash function is used to reduce an arbitrary quantity of data into an integer suitable for signing. Emacs comes with a bunch of them, accessible through secure-hash. It hashes strings and buffers.

(secure-hash 'sha224 "Hello, world!")
;; => "8552d8b7a7dc5476cb9e25dee69a8091290764b7f2a64fe6e78e9568"

Since the result is hexadecimal, just prefix 16# to turn it into a calc integer.

Here’s the signature and verification functions. Any string or buffer can be signed.

(defun rsa-sign (private-key object)
  (let ((n (plist-get private-key :n))
        (d (plist-get private-key :d))
        (hash (concat "16#" (secure-hash 'sha384 object))))
    ;; truncate hash such that hash < n
    (while (calc-eval "$1 > $2" 'pred hash n)
      (setf hash (calc-eval "$1 \\ 2" nil hash)))
    (rsa--mod-pow hash d n)))

(defun rsa-verify (public-key object sig)
  (let ((n (plist-get public-key :n))
        (e (plist-get public-key :e))
        (hash (concat "16#" (secure-hash 'sha384 object))))
    ;; truncate hash such that hash < n
    (while (calc-eval "$1 > $2" 'pred hash n)
      (setf hash (calc-eval "$1 \\ 2" nil hash)))
    (let* ((result (rsa--mod-pow sig e n)))
      (calc-eval "$1 == $2" 'pred result hash))))

Note the hash truncation step. If this is actually necessary, then your n is very easy to factor! It’s in there since this is just a toy and I want it to work with small keys.

Putting it all together

Here’s the whole thing in action with an extremely small, 128-bit key.

(setf message "hello, world!")

(setf keypair (rsa-generate-keypair 128))
;; => (:public  (:n "74924929503799951536367992905751084593"
;;               :e "65537")
;;     :private (:n "74924929503799951536367992905751084593"
;;               :d "36491277062297490768595348639394259869"))

(setf sig (rsa-sign (plist-get keypair :private) message))
;; => "31982247477262471348259501761458827454"

(rsa-verify (plist-get keypair :public) message sig)
;; => t

(rsa-verify (plist-get keypair :public) (capitalize message) sig)
;; => nil

Each of these operations took less than a second. For larger, secure-length keys, this implementation is painfully slow. For example, generating a 2048-bit key takes my laptop about half an hour, and computing a signature with that key (any size message) takes about a minute. That’s probably a little too slow for, say, signing ELPA packages.

Counting Processor Cores in Emacs

2015-10-14T03:17:16Z

One of the great advantages of dependency analysis is parallelization. Modern processors reorder instructions whose results don’t affect each other. Compilers reorder expressions and statements to improve throughput. Build systems know which outputs are inputs for other targets and can choose any arbitrary build order within that constraint. This article involves the last case.

The build system I use most often is GNU Make, either directly or indirectly (Autoconf, CMake). It’s far from perfect, but it does what I need. I almost always invoke it from within Emacs rather than in a terminal. In fact, I do it so often that I’ve wrapped Emacs’ compile command for rapid invocation.

I recently helped a co-worker set this set up for himself, so it had me thinking about the problem again. The situation in my config is much more complicated than it needs to be, so I’ll share a simplified version instead.

First bring in the usual goodies (we’re going to be making closures):

;;; -*- lexical-binding: t; -*-
(require 'cl-lib)

We need a couple of configuration variables.

(defvar quick-compile-command "make -k ")
(defvar quick-compile-build-file "Makefile")

Then a couple of interactive functions to set these on the fly. It’s not strictly necessary, but I like giving each a key binding. I also like having a history available via read-string, so I can switch between a couple of different options with ease.

(defun quick-compile-set-command (command)
  (interactive
   (list (read-string "Command: " quick-compile-command)))
  (setf quick-compile-command command))

(defun quick-compile-set-build-file (build-file)
  (interactive
   (list (read-string "Build file: " quick-compile-build-file)))
  (setf quick-compile-build-file build-file))

Now finally to the good part. Below, quick-compile is a non-interactive function that returns an interactive closure ready to be bound to any key I desire. It takes an optional target. This means I don’t use the above quick-compile-set-command to choose a target, only for setting other options. That will make more sense in a moment.

(cl-defun quick-compile (&optional (target ""))
  "Return an interaction function that runs `compile' for TARGET."
  (lambda ()
    (interactive)
    (save-buffer)  ; so I don't get asked
    (let ((default-directory
            (locate-dominating-file
             default-directory quick-compile-build-file)))
      (if default-directory
          (compile (concat quick-compile-command " " target))
        (error "Cannot find %s" quick-compile-build-file)))))

It traverses up (down?) the directory hierarchy towards root looking for a Makefile — or whatever is set for quick-compile-build-file — then invokes the build system there. I don’t believe in recursive make.

So how do I put this to use? I clobber some key bindings I don’t otherwise care about. A better choice might be the F-keys, but my muscle memory is already committed elsewhere.

(global-set-key (kbd "C-x c") (quick-compile)) ; default target
(global-set-key (kbd "C-x C") (quick-compile "clean"))
(global-set-key (kbd "C-x t") (quick-compile "test"))
(global-set-key (kbd "C-x r") (quick-compile "run"))

Each of those invokes a different target without second guessing me. Let me tell you, having “clean” at the tip of my fingers is wonderful.

Parallel Builds

An extension common to many different make programs is -j, which asks make to build targets in parallel where possible. These days where multi-core machines are the norm, you nearly always want to use this option, ideally set to the number of logical processor cores on your system. It’s a huge time-saver.

My recent revelation was that my default build command could be better: make -k is minimal. It should at least include -j, but choosing an argument (number of processor cores) is a problem. Today I use different machines with 2, 4, or 8 cores, so most of the time any given number will be wrong. I could use a per-system configuration, but I’d rather not. Unfortunately GNU Make will not automatically detect the number of cores. That leaves the matter up to Emacs Lisp.

Emacs doesn’t currently have a built-in function that returns the number of processor cores. I’ll need to reach into the operating system to figure it out. My usual development environments are Linux, Windows, and OpenBSD, so my solution should work on each. I’ve ranked them by order of importance.

Number of cores on Linux

Linux has the /proc virtual filesystem in the fashion of Plan 9, allowing different aspects of the system to be explored through the standard filesystem API. The relevant file here is /proc/cpuinfo, listing useful information about each of the system’s processors. To get the number of processors, count the number of processor entries in this file. I’ve wrapped it in if-file-exists so that it returns nil on other operating systems instead of throwing an error.

(when (file-exists-p "/proc/cpuinfo")
  (with-temp-buffer
    (insert-file-contents "/proc/cpuinfo")
    (how-many "^processor[[:space:]]+:")))

Number of cores on Windows

When I was first researching how to do this on Windows, I thought I would need to invoke the wmic command line program and hope the output could be parsed the same way on different versions of the operating system and tool. However, it turns out the solution for Windows is trivial. The environment variable NUMBER_OF_PROCESSORS gives every process the answer for free. Being an environment variable, it will need to be parsed.

(let ((number-of-processors (getenv "NUMBER_OF_PROCESSORS")))
  (when number-of-processors
    (string-to-number number-of-processors)))

Number of cores on BSD

This seems to work the same across all the BSDs, including OS X, though I haven’t yet tested it exhaustively. Invoke sysctl, which returns an undecorated number to be parsed.

(with-temp-buffer
  (ignore-errors
    (when (zerop (call-process "sysctl" nil t nil "-n" "hw.ncpu"))
      (string-to-number (buffer-string)))))

Also not complicated, but it’s the heaviest solution of the three.

Putting it all together

Join all these together with or, call it numcores, and ta-da.

(setf quick-compile-command (format "make -kj%d" (numcores)))

Now make is invoked correctly on any system by default.

NASM x86 Assembly Major Mode for Emacs

2015-04-19T02:38:23Z

Last weekend I created a new Emacs mode, nasm-mode, for editing Netwide Assembler (NASM) x86 assembly programs. Over the past week I tweaked it until it felt comfortable enough to share on MELPA. It’s got what you’d expect from a standard Emacs programming language mode: syntax highlighting, automatic indentation, and imenu support. It’s not a full parser, but it knows all of NASM’s instructions and directives.

Until recently I didn’t really have preferences about x86 assemblers (GAS, NASM, YASM, FASM, MASM, etc.) or syntax (Intel, AT&T). I stuck to the GNU Assembler (GAS) since it’s already there with all the other GNU development tools I know and love, and it’s required for inline assembly in GCC. However, nasm-mode now marks my commitment to NASM as my primary x86 assembler.

Why NASM?

I need an assembler that can assemble 16-bit code (8086, 8088, 80186, 80286), because real mode is fun. Despite its .code16gcc directive, GAS is not suitable for this purpose. It’s just enough to get the CPU into protected mode — as needed when writing an operating system with GCC — and that’s it. A different assembler is required for serious 16-bit programming.

GAS syntax has problems. I’m not talking about the argument order (source first or destination first), since there’s no right answer to that one. The linked article covers a number of problems, with these being the big ones for me:

The use of % sigils on all registers is tedious. I’m sure it’s handy when generating code, where it becomes a register namespace, but it’s annoying to write.
Integer constants are an easy source of bugs. Forget the $ and suddenly you’re doing absolute memory access, which is a poor default. NASM simplifies this by using brackets [] for all such “dereferences.”
GAS cannot produce pure binaries — raw machine code without any headers or container (ELF, COFF, PE). Pure binaries are useful for developing shellcode, bootloaders, 16-bit COM programs, and just-in-time compilers.

Being a portable assembler, GAS is the jack of all instruction sets, master of none. If I’m going to write a lot of x86 assembly, I want a tool specialized for the job.

YASM

I also looked at YASM, a rewrite of NASM. It supports 16-bit assembly and mostly uses NASM syntax. In my research I found that NASM used to lag behind in features due to slower development, which is what spawned YASM. In recent years this seems to have flipped around, with YASM lagging behind. If you’re using YASM, nasm-mode should work pretty well for you, since it’s still very similar.

YASM optionally supports GAS syntax, but this reintroduces almost all of GAS’s problems. Even YASM’s improvements (i.e. its ORG directive) become broken when switching to GAS syntax.

FASM

FASM is the “flat assembler,” an assembler written in assembly language. This means it’s only available on x86 platforms. While I don’t really plan on developing x86 assembly on a Raspberry Pi, I’d rather not limit my options! I already regard 16-bit DOS programming as a form of embedded programming, and this may very well extend to the rest of x86 someday.

Also, it hasn’t made its way into the various Linux distribution package repositories, including Debian, so it’s already at a disadvantage for me.

MASM

This is Microsoft’s assembler that comes with Visual Studio. Windows only and not open source, this is in no way a serious consideration. But since NASM’s syntax was originally derived from MASM, it’s worth mentioning. NASM takes the good parts of MASM and fixes the mistakes (such as the offset operator). It’s different enough that nasm-mode would not work well with MASM.

NASM

It’s not perfect, but it’s got an excellent manual, it’s a solid program that does exactly what it says it will do, has a powerful macro system, great 16-bit support, highly portable, easy to build, and its semantics and syntax has been carefully considered. It also comes with a simple, pure binary disassembler (ndisasm). In retrospect it seems like an obvious choice!

My one complaint would be that it’s that it’s too flexible about labels. The colon on labels is optional, which can lead to subtle bugs. NASM will warn about this under some conditions (orphan-labels). Combined with the preprocessor, the difference between a macro and a label is ambiguous, short of re-implementing the entire preprocessor in Emacs Lisp.

Why nasm-mode?

Emacs comes with an asm-mode for editing assembly code for various architectures. Unfortunately it’s another jack-of-all-trades that’s not very good. More so, it doesn’t follow Emacs’ normal editing conventions, having unusual automatic indentation and self-insertion behaviors. It’s what prompted me to make nasm-mode.

To be fair, I don’t think it’s possible to write a major mode that covers many different instruction set architectures. Each architecture has its own quirks and oddities that essentially makes gives it a unique language. This is especially true with x86, which, from its 37 year tenure touched by so many different vendors, comes in a number of incompatible flavors. Each assembler/architecture pair needs its own major mode. I hope I just wrote NASM’s.

One area where I’m still stuck is that I can’t find an x86 style guide. It’s easy to find half a dozen style guides of varying authority for any programming language that’s more than 10 years old … except x86. There’s no obvious answer when it comes to automatic indentation. How are comments formatted and indented? How are instructions aligned? Should labels be on the same line as the instruction? Should labels require a colon? (I’ve decided this is “yes.”) What about long label names? How are function prototypes/signatures documented? (The mode could take advantage of such a standard, a la ElDoc.) It seems everyone uses their own style. This is another conundrum for a generic asm-mode.

There are a couple of other nasm-modes floating around with different levels of completeness. Mine should supersede these, and will be much easier to maintain into the future as NASM evolves.

Emacs Autotetris Mode

2014-10-19T21:45:53Z

For more than a decade now, Emacs has come with a built-in Tetris clone, originally written by XEmacs’ Glynn Clements. Just run M-x tetris any time you want to play. For anyone too busy to waste time playing Tetris, earlier this year I wrote an autotetris-mode that will play the Emacs game automatically.

https://github.com/skeeto/autotetris-mode

Load the source, autotetris-mode.el and M-x autotetris. It will start the built-in Tetris but make all the moves itself. It works best when byte compiled.

At the time I had read an article and was interested in trying my hand at my own Tetris AI. Like most things Emacs, the built-in Tetris game is very hackable. It’s also pretty simple and easy to understand. Rather than write my own I chose to build upon this one.

Heuristics

It’s not a particularly strong AI. It doesn’t pay attention to the next piece in queue, it doesn’t know the game’s basic shapes, and it doesn’t try to maximize the score (clearing multiple rows at once). The goal is to continue running for as long as possible. But since it’s able to get to the point where the game is so fast that the AI is unable to move pieces fast enough (it’s rate limited like a human player), that means it’s good enough.

When a new piece appears at the top of the screen, the AI, in memory, tries placing it in all possible positions and all possible orientations. For each of these positions it runs a heuristic on the resulting game state, summing five metrics. Each metric is scaled by a hand-tuned weight to adjust its relative priority. Smaller is better, so the position with the lowest score is selected.

Number of Holes

A hole is any open space that has a solid block above it, even if that hole is accessible without passing through a solid block. Count these holes.

Maximum Height

Add the height of the tallest column. Column height includes any holes in the column. The game ends when a column touches the top of the screen (or something like that), so this should be kept in check.

Mean Height

Add the mean height of all columns. The higher this is, the closer we are to losing the game. Since each row will have at least one hole, this will be a similar measure to the hole count.

Height Disparity

Add the difference between the shortest column height and the tallest column height. If this number is large it means we’re not making effective use of the playing area. It also discourages the AI from getting into that annoying situation we all remember: when you really need a 4x1 piece that never seems to come. Those are the brief moments when I truly believe the version I’m playing has to be rigged.

Surface Roughness

Take the root mean square of the column heights. A rougher surface leaves fewer options when placing pieces. This measure will be similar to the disparity measurement.

Emacs-specific Details

With a position selected, the AI sends player inputs at a limited rate to the game itself, moving the piece into place. This is done by calling tetris-move-right, tetris-move-left, and tetris-rotate-next, which, in the normal game, are bound to the arrow keys.

The built-in tetris-mode isn’t quite designed for this kind of extension, so it needs a little bit of help. I defined two pieces of advice to create hooks. These hooks alert my AI to two specific events in the game: the game start and a fresh, new piece.

(defadvice tetris-new-shape (after autotetris-new-shape-hook activate)
  (run-hooks 'autotetris-new-shape-hook))

(defadvice tetris-start-game (after autotetris-start-game-hook activate)
  (run-hooks 'autotetris-start-game-hook))

I talked before about the problems with global state. Fortunately, tetris-mode doesn’t store any game state in global variables. It stores everything in buffer-local variables, which can be exploited for use in the AI. To perform the “in memory” heuristic checks, it creates a copy of the game state and manipulates the copy. The copy is made by way of clone-buffer on the *Tetris* buffer. The tetris-mode functions all work equally as well on the clone, so I can use the existing game rules to properly place the next piece in each available position. The game’s own rules take care of clearing rows and checking for collisions for me. I wrote an autotetris-save-excursion function to handle the messy details.

(defmacro autotetris-save-excursion (&rest body)
  "Restore tetris game state after BODY completes."
  (declare (indent defun))
  `(with-current-buffer tetris-buffer-name
     (let ((autotetris-saved (clone-buffer "*Tetris-saved*")))
       (unwind-protect
           (with-current-buffer autotetris-saved
             (kill-local-variable 'kill-buffer-hook)
             ,@body)
         (kill-buffer autotetris-saved)))))

The kill-buffer-hook variable is also cloned, but I don’t want tetris-mode to respond to the clone being killed, so I clear out the hook.

That’s basically all there is to it! While watching it feels like it’s making dumb mistakes, not placing pieces in optimal positions, but it recovers well from these situations almost every time, so it must know what it’s doing. Currently it’s a better player than me, which is my rule-of-thumb for calling an AI successful.

Emacs Unicode Pitfalls

2014-06-13T05:58:34Z

GNU Emacs is seven years older than Unicode. Support for Unicode had to be added relatively late in Emacs’ existence. This means Emacs has existed longer without Unicode support (16 years) than with it (14 years). Despite this, Emacs has excellent Unicode support. It feels as if it was there the whole time.

However, as a natural result of Unicode covering all sorts of edge cases for every known human language, there are pitfalls and complications. As a user of Emacs, you’re not particularly affected by these, but extension developers might run into trouble while handling Emacs character-oriented data structures: strings and buffers.

In this article I’ll go over Elisp’s Unicode surprises. I’ve been caught by some of these myself. In fact, as a result of writing this article, I’ve discovered subtle encoding bugs in some of my own extensions. None of these pitfalls are Emacs’ fault. They’re just the result of complexities of natural language.

Unicode and Code Points

First, there are excellent materials online for learning Unicode. I recommend starting with UTF-8 and Unicode FAQ for Unix/Linux. There’s no reason for me to repeat all this information here, but I’ll attempt to quickly summarize it.

Unicode maps code points (integers) to specific characters, along with a standard name. As of this writing, Unicode defines over 110,000 characters. For backwards compatibility, the first 128 code points are mapped to ASCII. This trend continues for other character standards, like Latin-1.

In Emacs, Unicode characters are entered into a buffer with C-x 8 RET (insert-char). You can enter either the official name of the character (e.g. “GREEK SMALL LETTER PI” for π) or the hexadecimal code point. Outside of Emacs it depends on the application, but C-S-u followed by the hexadecimal code works for most of the applications I care about.

Encodings

The Unicode standard also describes several methods for encoding sequences of code points into sequences of bytes. Obviously a selection of 110,000 characters cannot be encoded with one byte per letter, so these are multibyte encodings. The two most popular encodings are probably UTF-8 and UTF-16.

UTF-8 was designed to be backwards compatible with ASCII, Unix, and existing C APIs (null-terminated C strings). The first 128 code points are encoded directly as a single byte. Every other character is encoded with two to six bytes, with the highest bit of each byte set to 1. This ensures that no part of a multibyte character will be interpreted as ASCII, nor will it contain a null (0). The latter means that C programs and C APIs can handle UTF-8 strings with few or no changes. Most importantly, every ASCII encoded file is automatically a UTF-8 encoded file.

UTF-16 encodes all the characters from the Basic Multilingual Plane (BMP) with two bytes. Even the original ASCII characters get two bytes (16 bits). The BMP covers virtually all modern languages and is generally all you’ll ever practically need. However, this doesn’t include the important TROPICAL DRINK or PILE OF POO characters from the supplemental (“astral”) plane. If you need to use these characters in UTF-16, you’re going to run into problems: characters outside the BMP don’t fit in two bytes. To accommodate these characters, UTF-16 uses surrogate pairs: these characters are encoded with two 16-bit units.

Because of this last point, UTF-16 offers no practical advantages over UTF-8. Its existence was probably a big mistake. You can’t do constant-time character lookup because you have to scan for surrogate pairs. It’s not backwards compatible and cannot be stored in null-terminated strings. In both Java and JavaScript, it leads to the awkward situation where the “length” of a string is not the number of characters, code points, or even bytes. Worst of all, it has serious security implications. New applications should avoid it whenever possible.

Emacs and UTF-8

Emacs internally stores all text as UTF-8. This was an excellent choice! When text leaves Emacs, such as writing to a file or to a process, Emacs automatically converts it to the coding system configured for that particular file or process. When it accepts text from a file or process, it either converts it to UTF-8 or preserves it as raw bytes.

There are two modes for this in Emacs: unibyte and multibyte. Unibyte strings/buffers are just raw bytes. They have constant access O(1) time but can only hold single-byte values. The byte-code compiler outputs unibyte strings.

Multibyte strings/buffers hold UTF-8 encoded code points. Character access is O(n) because the string/buffer has to be scanned to count characters.

The actual encoding is rarely relevant because there’s little way (and need) to access it directly. Emacs automatically converts text as needed when it leaves Emacs and arrives in Emacs, so there’s no need to know the internal encoding. If you really want to see it anyway, you can use string-as-unibyte to get a copy of a string with the exact same bytes, but as a byte-string.

(string-as-unibyte "π")
;; => "\317\200"

This can be reversed with string-as-multibyte), to change a unibyte string holding UTF-8 encoded text back into a multibyte string. Note that these functions are different than string-to-unibyte and string-to-multibyte, which will attempt a conversion rather than preserving the raw bytes.

The length and buffer-size functions always count characters in multibyte and bytes in unibyte. Being UTF-8, there are no surrogate pairs to worry about here. The string-bytes and position-bytes functions return byte information for both multibyte and unibyte.

To specify a Unicode character in a string literal without using the character directly, use \uXXXX. The XXXX is the hexadecimal code point for the character and is always 4 digits long. For characters outside the BMP, which won’t fit in four digits, use a capital U with eight digits: \UXXXXXXXX.

"\u03C0"
;; => "π"

"\U0001F4A9"
;; => "💩"  (PILE OF POO)

Finally, Emacs extends Unicode with 256 additional “characters” representing raw bytes. This allows raw bytes to be embedded distinctly within UTF-8 sequences. For example, it’s used to distinguish the code point U+0041 from the raw byte #x41. As far as I can tell, this isn’t used very often.

Combining Characters

Some Unicode characters are defined as combining characters. These characters modify the non-combining character that appears before it, typically with accents or diacritical marks.

For example, the word “naïve” can be written as six characters as "nai\u0308ve". The fourth character, U+0308 (COMBINING DIAERESIS), is a combining character that changes the “i” (U+0069 LATIN SMALL LETTER I) into an umlaut character.

The most commonly accented characters have a code of their own. These are called precomposed characters. This includes ï (U+00EF LATIN SMALL LETTER I WITH DIAERESIS). This means “naïve” can also be written as five characters as "na\u00EFve".

Normalization

So what happens when comparing two different representations of the same text? They’re not equal.

(string= "nai\u0308ve" "na\u00EFve")
;; => nil

To deal with situations like this, the Unicode standard defines four different kinds of normalization. The two most important ones are NFC (composition) and NFD (decomposition). The former uses precomposed characters whenever possible and the latter breaks them apart. The functions ucs-normalize-NFC-string and ucs-normalize-NFD-string perform this operation.

Pitfall #1: Proper string comparison requires normalization. It doesn’t matter which normalization you use (though NFD should be slightly faster), you just need to use it consistently. Unfortunately this can get tricky when using equal to compare complex data structures with multiple strings.

(string= (ucs-normalize-NFD-string "nai\u0308ve")
         (ucs-normalize-NFD-string "na\u00EFve"))
;; => t

Emacs itself fails to do this. It doesn’t normalize strings before interning them, which is probably a mistake. This means you can have differently defined variables and functions with the same canonical name.

(eq (intern "nai\u0308ve")
    (intern "na\u00EFve"))
;; => nil

(defun print-résumé ()
  "NFC-normalized form."
  (print "I'm going to sabotage your team."))

(defun print-résumé ()
  "NFD-normalized form."
  (print "I'd be a great asset to your team."))

(print-résumé)
;; => "I'm going to sabotage your team."

String Width

There are three ways to quantify multibyte text. These are often the same value, but in some circumstances they can each be different.

length: number of characters, including combining characters
bytes: number of bytes in its UTF-8 encoding
width: number of columns it would occupy in the current buffer

Most of the time, one character is one column (a width of one). Some characters, like combining characters, consume no columns. Many Asian characters consume two columns (U+4000, 䀀). Tabs consume tab-width columns, usually 8.

Generally, a string should have the same width regardless of which whether it’s NFD or NFC. However, due to bugs and incomplete Unicode support, this isn’t strictly true. For example, some combining characters, such as U+20DD ⃝, won’t combine correctly in Emacs nor in other applications.

Pitfall #2: Always measure text by width, not length, when laying out a buffer. Width is measured with the string-width function. This comes up when laying out tables in a buffer. The number of characters that fit in a column depends on what those characters are.

Fortunately I accidentally got this right in Elfeed because I used the format function for layout. The %s directive operates on width, as would be expected. However, this has the side effect that the output of may format change depending on the current buffer! Pitfall #3: Be mindful of the current buffer when using the format function.

(let ((tab-width 4))
  (length (format "%.6s" "\t")))
;; => 1

(let ((tab-width 8))
  (length (format "%.6s" "\t")))
;; => 0

String Reversal

Say you want to reverse a multibyte string. Simple, right?

(defun reverse-string (string)
  (concat (reverse (string-to-list string))))

(reverse-string "abc")
;; => "cba"

Wrong! The combining characters will get flipped around to the wrong side of the character they’re meant to modify.

(reverse-string "nai\u0308ve")
;; => "ev̈ian"

Pitfall #4: Reversing Unicode strings is non-trivial. The Rosetta Code page is full of incorrect examples, and I’m personally guilty of this, too. The other day I submitted a patch to s.el to correct its s-reverse function for Unicode. If it’s accepted, you should never need to worry about this.

Regular Expressions

Regular expressions operate on code points. This means combining characters are counted separately and the match may change depending on how characters are composed. To avoid this, you might want to consider NFC normalization before performing some kinds of regular expressions.

;; Like string= from before:
(string-match-p  "na\u00EFve" "nai\u0308ve")
;; => nil

;; The . only matches part of the composition
(string-match-p "na.ve" "nai\u0308ve")
;; => nil

Pitfall #5: Be mindful of combining characters when using regular expressions. Prefer NFC normalization when dealing with regular expressions.

Another potential problem is ranges, though this is quite uncommon. Ranges of characters can be expressed in inside brackets, e.g. [a-zA-Z]. If the range begins or ends with a decomposed combining character you won’t get the proper range because its parts are considered separately by the regular expression engine.

(defvar match-weird "[\u00E0-\u00F6]+")

(string-match-p match-weird "áâãäå")
;; => 0  (successful match)

(string-match-p (ucs-normalize-NFD-string match-weird) "áâãäå")
;; => nil

It’s especially important to keep all of this in mind when sanitizing untrusted input, such as when using Emacs as a web server. An attacker might use a denormalized or strange grapheme cluster to bypass a filter.

Interacting with the World

Here’s a mistake I’ve made twice now. Emacs uses UTF-8 internally, regardless of whatever encoding the original text came in. Pitfall #6: When working with bytes of text, the counts may be different than the original source of the text.

For example, HTTP/1.1 introduced persistent connections. Before this, a client connects to a server and asks for content. The server sends the content and then closes the connection to signal the end of the data. In HTTP/1.1, when Connection: close isn’t specified, the server will instead send a Content-Length header indicating the length of the content in bytes. The connection can then be re-used for more requests, or, more importantly, pipelining requests.

The main problem is that HTTP headers usually have a different encoding than the content body. Emacs is not prepared to handle multiple encodings from a single source, so the only correct way to talk HTTP with a network process is raw. My mistake was allowing Emacs to do the UTF-8 conversion, then measuring the length of the content in its UTF-8 encoding. This just happens to work fine about 99.9% of the time since clients tend to speak UTF-8, or something like it, anyway, but it’s not correct.

An Emacs Foreign Function Interface

2014-04-26T16:25:51Z

For many years Richard Stallman (RMS) prohibited a foreign function interface (FFI) in GNU Emacs. An FFI is an API for dynamically calling native libraries at run-time, like the Java Native Interface (JNI). He was concerned that people might use it to make proprietary extensions to the popular editor. This was the same (paranoid) justification for rejecting a package manager in Emacs for many years, that someone might use it to distribute proprietary packages.

Fortunately, times have changed. RMS reevaluated his stances on FFI and on package managers. Today Emacs comes with a package manager (package.el), and there are multiple package repositories with no proprietary packages in sight. Though, outside of some unaccepted patches, no significant progress has been made to add an FFI.

A few weeks ago I did something about that by writing a package that adds an FFI. It requires no patches or any other changes to Emacs itself. Instead, it drives a subprocess running libffi, passing arguments and return values back and forth through a pipe, in the spirit of EmacSQL. It’s not as efficient as a built-in API, but it could potentially be distributed through an ELPA repository.

Emacs Lisp Foreign Function Interface

The API is modeled loosely after Julia’s elegant FFI. A call interface (CIF) doesn’t need to be prepared ahead of time. Provide all the necessary information at the call site and the library takes care of building and caching CIFs and handles for you.

API Examples

The core function for the FFI is ffi-call. Here’s an example that calls the system’s srand() and then rand().

;; seed with 0
(ffi-call nil "srand" [:void :uint32] 0)
;; => :void

(ffi-call nil "rand" [:sint32])
;; => 1102520059

The first two arguments are similar to the first two arguments of dlsym(). For ffi-call, the first argument is the library shared object name. The back-end automatically takes care of obtaining a handle on the library with dlopen(). In this case we’re accessing a function that’s already in the main program, so we pass nil. This is identical to passing NULL to dlsym(). In this FFI, nil always corresponds to NULL.

The second argument is the function name, just like dlsym()’s second argument.

The third argument is the function signature. It’s a vector of keywords declaring the return value type followed by the types of each argument. In this example, srand() returns nothing (void) and accepts a single 32-bit unsigned argument, so the signature is [:void :uint32].

The remaining arguments are the native function arguments. I can keep making the second FFI call (“rand”) to retrieve different numbers, using the first FFI call (“srand”) to reset the sequence.

Using a Library

Here’s another example, loading libm and calling cos.

;; cos(1.2)
(ffi-call "libm.so" "cos" [:double :double] 1.2)
;; => 0.362357754476674

The first time a library is used, the back-end creates a handle for it with dlopen(). Further calls will reuse the handle, trying to be as efficient as possible. Handles are never closed.

Pointers

Here are a couple of examples that use pointers. As stated before, nil is used to pass a NULL pointer. Like the underlying libffi, the FFI doesn’t care what kind of pointer you’re passing, just that it’s a pointer, so it’s declared with :pointer.

;; time(NULL);
(ffi-call nil "time" [:uint64 :pointer] nil)
;; => 1396496875

Strings are automatically copied to the subprocess, their lifetime tied to the lifetime of the Elisp string (note: this detail is still unimplemented). When used as arguments, they become pointers.

;; getenv("DISPLAY")
(ffi-call nil "getenv" [:pointer :pointer] "DISPLAY")
;; => 0x7fffc13ceb29

(ffi-get-string '0x7fffc13ceb29)
;; => ":0"

Pointers can be handled as values on the Elisp side. They’re represented as symbols whose name is an address. In the above example, 0x7fffc13ceb29 is one of these symbols. I would have preferred to use a plain integer to represent pointers, but, because Elisp integers are tagged, they’re guaranteed not to be wide enough for this. I plan to add pointer operators to do pointer arithmetic on these special pointer values.

The function ffi-get-string is used to retrieve the null-terminated string referenced by a pointer. If the string returned by getenv() needed to be freed (it doesn’t and shouldn’t), the FFI caller would need to be careful to call free() as another FFI call.

How It Works: The Stack Machine

My goal is to keep the back-end as simple as possible. All resource management is handled by Emacs, tied to garbage collection. For example, the pointer returned by dlopen() isn’t stored anywhere in the subprocess. It’s passed to Emacs and managed there. To call a function using the handle, the pointer is transmitted back to the subprocess.

To keep it simple, the back-end is just a stack machine with a simple human-readable bytecode. You can see the instruction set by looking at the big switch statement in ffi-glue.cc. For example, to push a signed 2-byte integer 237 onto the stack, send a j followed by an ASCII representation of the number (terminated by a space if needed): j237.

As usual, my assumption is that the Elisp printer and reader is faster than any possible serialization I could implement within Elisp itself. This also nicely sidesteps the byte-order issue.

The function signature is declared by pushing zeros of the return/argument types onto the stack, with a special void “value” used to communicate void. Once it’s all set up, the C instruction is called, collapsing the signature into a CIF handle: a pointer for the Elisp side to manage.

Pointers to raw strings of bytes are pushed onto the stack with the M instruction. It pops the top integer on the stack to get the byte count, reads that number of bytes from input into a buffer, null-terminates the buffer in case it’s used as a string, and finally puts a pointer to that buffer on the stack.

Calling functions is just a matter of pushing all the needed information onto the stack, invoking libffi to magically call the function, then popping the result off the stack. Popping a value transmits it to Elisp.

Stack Machine Example

Here’s a concise example that calls cos(1.2) (assuming libm.so is already linked). The actual Elisp-generated FFI bytecode doesn’t plan things quite this way — particularly because it needs to keep track of the various pointers involved — but this example keeps it simple.

d1.2d0d0w1Cp0w3McosSco

You can run this example manually by executing the ffi-glue program and pasting in that line as standard input. The result will be printed.

d1.2 : Push a double, 1.2, onto the stack. This will be the function argument.
d0d0 : Push a couple of zero doubles onto the stack. This is our function signature. It takes a double and returns a double.
w1 : Push an unsigned 32-bit 1 onto the stack. Instructions that use integers accept unsigned 32-bit integers. This 1 indicates that our function accepts one argument.
C : create a CIF. The integer 1 and the two 0 doubles are consumed and a pointer to a CIF is put on the stack. Elisp would normally pop this off and save it for future use, but we’re going to leave it there (and ultimately leak it in the example).
p0 : Push a NULL onto the stack. p means push a pointer and 0 is a NULL pointer. This is our library handle. We’re assuming cos will be in the main program.
w3Mcos : Put a pointer to the string “cos” into the stack. First push on the number 3 (string length), then M to read from input, then pass three bytes: “cos”. In our example, this buffer will be leaked because we lose the buffer pointer.
S : Call dlsym() on the string and handle on top of the stack. This consumes the top two values (NULL and “cos”), and pushes a function handle on top of the stack. At this point the stack has three values: 1.2, the CIF, and the function handle.
c : Call the function pointed to by the top of the stack. This consumes the top pointer, the CIF below it, and the CIF indicates how many more values to consume: just one in this case, since the function takes one argument. The function’s return value is pushed on the stack. If the function is void, the special void “value” is pushed on the stack.
o : Pop the top stack value, sending it to Emacs. This is what would be returned by ffi-call.

Before I got the Elisp side of things going, I was testing out the back-end by writing lots of little programs like this by hand.

A Safe FFI

While using an FFI through a pipe is slow compared to a built-in FFI, there is a distinct advantage. The FFI can never crash Emacs! Normally, making calls to an FFI is unsafe. It allows the programmer to violate normal language constraints. If the programmer misuses the FFI, the whole process may crash or become corrupt. This will lose any state held behind foreign interface, but Emacs will be safe.

In my package, the handle for the FFI Emacs subprocess is called the context. A context is automatically established and bound to the ffi-context global variable as needed. This context keeps track of CIFs, string buffers, handles, and any other resources held by the subprocess. If the subprocess dies, the context becomes meaningless since the pointers it holds are dead.

Limitations

This FFI package is about 80% complete. It occasionally leaks memory in the subprocess, it’s overly-sensitive to mis-typing, it doesn’t manage stdin/stdout, it can’t inspect/modify structs, and it can’t set up closures.

The last point, closures, would require some changes to the interprocess communication. The purpose here would be to allow foreign functions to call Elisp functions. The subprocess would need to be able to initiate activity with Elisp.

Manipulating structs is complex, and even libffi has limited support for working with them. It allows structs to be declared, but leaves alignment and access up to the user to sort out. That’s where the previously-mentioned pointer arithmetic comes into play.

Currently stdin, stdout, and stderr are problems, especially when I was trying to write a test GTK application with Elisp. Any command line junkie knows that GTK (and Qt) applications are ridiculously noisy. It spews hundreds of lines of warnings and notifications as part of its normal operation. This noise interferes with FFI communication with Emacs. I need to figure out how to separate this and get standard input/output/error to/from Emacs through separate channels.

Like libffi, there are no guarantees about variadic function calls. It should generally Just Work, but you can’t rely on it.

The whole thing will not work as well in 32-bit Emacs, where integers are limited to a tiny 29 bits. For example, those rand() return values will simply not fit. In the long run, this is probably the single largest barrier to making the FFI work smoothly. It’s too easy to run into large integer values.

Right now I consider it a proof of concept; an FFI really can be done this way. I don’t have any particular uses in mind, and, outside of the “cool factor,” I can’t actually think of any useful applications. If a solid FFI already existed, I may have tried to use it for EmacSQL rather than use this subprocess trick. My FFI is probably mature enough to drive SQLite, so maybe this is the future of EmacSQL.

If you can think of a good use for an Emacs FFI, please share it. I need good test ideas.

Emacs Lisp Defstruct Namespace Convention

2014-03-19T01:41:52Z

One of the drawbacks of Emacs Lisp is the lack of namespaces. Every defun, defvar, defcustom, defface, defalias, defstruct, and defclass establishes one or more names in the global scope. To work around this, package authors are strongly encouraged to prefix every global name with the name of its package. That way there should never be a naming conflict between two different packages.

(defvar mypackage-foo-limit 10)

(defvar mypackage--bar-counter 0)

(defun mypackage-init ()
  ...)

(defun mypackage-compute-children (node)
  ...)

(provide 'mypackage)

While this has solved the problem for the time being, attaching the package name to almost every identifier, including private function and variable names, is quite cumbersome. Namespaces can almost be hacked into the language by using multiple obarrays, but symbols have internal linked lists that prohibit inclusion in multiple obarrays.

By convention, private names are given a double-dash after the namespace. If a “bar counter” is an implementation detail that may disappear in the future, it will be called mypackage--bar-counter to warn users and other package authors not to rely on it.

There’s been a recent push to follow this namespace-prefix policy more strictly, particularly with the depreciation of cl and introduction of cl-lib. I suspect someday when namespaces are finally introduced, packages with strictly clean namespaces with be at an advantage, somehow automatically supported. Nic Ferrier has proposed ideas for how to move forward on this.

How strict are we talking?

Over the last few years I’ve gotten much stricter in my own packages when it comes to namespace prefixes. You can see the progression going from javadoc-lookup (2010) where I was completely sloppy about it, to EmacSQL (2014) where every single global identifier is meticulously prefixed.

For a time I considered names such as make-* and with-* to be exceptions to the rule, since these names are idioms inherited from Common Lisp. The namespace comes after the expected prefix. I’ve changed my mind about this, which has caused me to change my usage of defstruct (now cl-defstruct).

Just as in Common Lisp, by default cl-defstruct defines a constructor starting with make-*. This is fine in Common Lisp, where it’s a package-private function by default, but in Emacs Lisp this pollutes the global namespace.

(require 'cl-lib)

;; Defines make-circle, circle-x, circle-y, circle-radius, circle-p
(cl-defstruct circle
  x y radius)

(defvar unit-circle (make-circle :x 0.0 :y 0.0 :radius 1.0))

unit-circle
;; => [cl-struct-circle 0.0 0.0 1.0]

(circle-radius unit-circle)
;; => 1.0

This constructor isn’t namespace clean, so package authors should avoid defstruct’s default. If the package is named circle then all of the accessors are perfectly fine, though.

To fix this, I now use another, more recent Emacs Lisp idiom: name the constructor create. That is, for the package circle, we desire circle-create. To get this behavior from cl-defstruct, use the :constructor option.

;; Clean!
(cl-defstruct (circle (:constructor circle-create))
  x y radius)

(circle-create :x 0 :y 0 :radius 1)
;; => [cl-struct-circle 0 0 1]

(provide 'circle)

This affords a new opportunity to craft a better constructor. Have cl-defstruct define a private constructor, then manually write a constructor with a nicer interface. It may also do additional work, like enforce invariants or initialize dependent slots.

(cl-defstruct (circle (:constructor circle--create))
  x y radius)

(defun circle-create (x y radius)
  (let ((circle (circle--create :x x :y y :radius radius)))
    (if (< radius 0)
        (error "must have non-negative radius")
      circle)))

(circle-create 0 0 1)
;; => [cl-struct-circle 0 0 1]

(circle-create 0 0 -1)
;; error: "must have non-negative radius"

This is now how I always use cl-defstruct in Emacs Lisp. It’s a tidy convention that will probably become more common in the future.

Introducing EmacSQL

2014-02-06T05:52:37Z

Yesterday I made the first official release of EmacSQL, an Emacs package I’ve been working on for the past few weeks. EmacSQL is a high-level SQL database for Emacs. It primarily targets SQLite as a back-end, but it also currently supports PostgreSQL and MySQL.

https://github.com/skeeto/emacsql

It’s available on MELPA and is ready for immediate use. It depends on the finalizers package I added last week.

While there’s a non-Elisp component, SQLite, there are no special requirements for the user to worry about. When the package’s Elisp is compiled, if a C compiler is available it will use it to compile a SQLite binary for EmacSQL. If not, it will later offer to download a pre-built binary that I built. Ideally this makes the non-Elisp part of EmacSQL completely transparent and users can pretend Emacs has a built-in relational database.

The official SQLite command line shell is not used even if present, and I’ll explain why below.

Just as Skewer jump started my web development experience, EmacSQL has been a crash course in SQL and relational databases. Before starting this project I knew little about this topic and I’ve gained a lot of appreciation for it in the process. Building an Emacs extension is a very rapid way to dive into a new topic.

If you’re a total newb about this stuff like I was and want to learn SQL for SQLite yourself, I highly recommend Using SQLite. It’s a really solid introduction.

High-level SQL Compiler

By “high-level” I mean that it goes beyond assembling strings containing SQL code. In EmacSQL, statements are assembled from s-expressions which, behind the scenes, are compiled into SQL using some simple rules. This means if you already know SQL you should be able to hit the ground running with EmacSQL. Here’s an example,

(require 'emacsql)

;; Connect to the database, SQLite in this case:
(defvar db (emacsql-connect "~/office.db"))

;; Create a table with 3 columns:
(emacsql db [:create-table patients
             ([name (id integer :primary-key) (weight float)])])

;; Insert a few rows:
(emacsql db [:insert :into patients
             :values (["Jeff" 1000 184.2] ["Susan" 1001 118.9])])

;; Query the database:
(emacsql db [:select [name id]
             :from patients
             :where (< weight 150.0)])
;; => (("Susan" 1001))

;; Queries can be templates, using $s1, $i2, etc. as parameters:
(emacsql db [:select [name id]
             :from patients
             :where (> weight $s1)]
         100)
;; => (("Jeff" 1000) ("Susan" 1001))

A query is a vector of keywords, identifiers, parameters, and data. Thanks to parameters, these s-expression statements should not need to be constructed dynamically at run-time.

The compilation rules are listed in the EmacSQL documentation so I won’t repeat them in detail here. In short, lisp keywords become SQL keywords, row-oriented information is always presented as vectors, expressions are lists, and symbols are identifiers, except when quoted.

[:select [name weight] :from patients :where (< weight 150.0)]

That compiles to this,

SELECT name, weight FROM patients WHERE weight < 150.0;

Also, any readable lisp value can be stored in an attribute. Integers are mapped to INTEGER, floats are mapped to REAL, nil is mapped to NULL, and everything else is printed and stored as TEXT. The specifics vary depending on the back-end.

Parameters

A symbol beginning with a dollar sign is a parameter. It has a type — identifier (i), scalar (s), vector (v), schema (S) — and an argument position.

[:select [$i1] :from $i2 :where (< $i3 $s4)]

Given the arguments name people age 21, three symbols and an integer, it compiles to:

SELECT name FROM people WHERE age < 21;

A vector parameter refers to rows to be inserted or as a set for an IN expression.

[:insert-into people [name age] :values $v1]

Given the argument (["Jim" 45] ["Jeff" 34]), a list of two rows, this becomes,

INSERT INTO people (name, age) VALUES ('"Jim"', 45), ('"Jeff"', 34);

And this,

[:select * :from tags :where (in tag $v1)]

Given the argument [hiking camping biking] becomes,

SELECT * FROM tags WHERE tag IN ('hiking', 'camping', 'biking');

When writing these expressions keep in mind the command emacsql-show-last-sql. It will display in the minibuffer the SQL result of the s-expression statement before the point.

Schemas

A table schema is a list whose first element is a column specification vector (i.e. row-oriented information is presented as vectors). The remaining elements are table constraints. Here are the examples from the documentation,

;; No constraints schema with four columns:
([name id building room])

;; Add some column constraints:
([(name :unique) (id integer :primary-key) building room])

;; Add some table constraints:
([(name :unique) (id integer :primary-key) building room]
 (:unique [building room])
 (:check (> id 0)))

In the handful of EmacSQL databases I’ve created for practice and testing, I’ve put the schema in a global constant. A table schema is a part of a program’s type specifications, and rows are instances of that type, so it makes sense to declare schemas up top with things like defstructs.

These schemas can be substituted into a SQL statement using a $S parameter (capital “S” for Schema).

(defconst foo-schema-people
  '([(person-id integer :primary-key) name age]))

;; ...

(defun foo-init (db)
  (emacsql db [:create-table $i1 $S2] 'people foo-schema-people))

Back-ends

Everything I’ve discussed so far is restricted to the SQL statement compiler. It’s completely independent of the back-end implementations, themselves mostly handling strings of SQL statements.

SQLite Implementation Difficulties

A little over a year ago I wrote a pastebin webapp in Elisp. I wanted to use SQLite as a back-end for storing pastes but struggled to get the SQLite command shell, sqlite3, to cooperate with Emacs. The problem was that all of the output modes except for “tcl” are ambiguous. This includes the “csv” formatted output. TEXT values can dump newlines, allowing rows to span an arbitrary number of lines. They can dump things that look like the sqlite3 prompt, so it’s impossible to know when sqlite3 is done printing results. I ultimately decided the command shell was inadequate as an Emacs subprocess.

Recently there was some discussion from alexbenjm and Andres Ramirez on an Elfeed post about using SQLite as an Elfeed back-end. This inspired me to take another look and that’s when I came up with a workaround for SQLite’s ambiguity: only store printed Elisp values for TEXT values! With print-escape-newlines set, TEXT values no longer span multiple lines, and I can use read to pull in data from sqlite3. All of sqlite3’s output modes were now unambiguous.

However, after making significant progress I discovered an even bigger issue: GNU Readline. The sqlite3 binary provided by Linux package repositories is almost always compiled with Readline support. This makes the tool much more friendly to use, but it’s a huge problem for Emacs.

First, sqlite3 the command shell is not up to the same standards as SQLite the database. Not by a long shot. In my short time working with SQLite I’ve already discovered several bugs in the command shell. For one, it’s not properly integrated with GNU Readline. There’s an .echo meta-command that turns command echoing on and off. That is, it repeats your command back to you. Useful in some circumstances, though not mine. The bug is that this echo is separate from GNU Readline’s echo. When Readline is active and .echo is enabled, there are actually two echos. Turn it off and there’s one echo.

Pseudo-terminals

Under some circumstances, like when communicating over a pipe rather than a PTY, Readline will mostly become deactivated. This would have been a workaround, but when Readline is disabled sqlite3 heavily buffers its output. This breaks any sort of interaction. Even worse, on Windows stderr is not always unbuffered, so sqlite3’s error messages may not appear for a long time (another bug).

Besides the problem of getting Readline to shut up, another problem is getting Readline to stop acting on control characters. The first 32 characters in ASCII are control characters. A pseudo-terminal (PTY) that is not in raw mode will immediately act upon any control characters it sees. There’s no escaping them.

Emacs communicates with subprocesses through a PTY by default (probably an early design mistake), limiting the kind of data that can be transmitted. You can try this yourself in a comint mode sometime where a subprocess is used (not a socket like SLIME). Fire up M-x sql-sqlite (part of Emacs) and try sending a string containing byte 0x1C (28, file separator). You can type one by pressing C-q C-\. Send that byte and the subprocess dies.

There are two ways to work around this. One is to use a pipe (bind process-connection-type to nil). Pipes don’t respond to control characters. This doesn’t work with sqlite3 because of the previously-mentioned buffering issue.

The other way to work around this is to put the PTY in raw mode. Unfortunately there’s no function to do this so you need to call stty. Of course, this program needs to run on the same PTY, so a start-process-shell-command is required.

(start-process-shell-command name buffer "stty raw && ")

Windows has neither stty nor PTYs (nor any of PTY’s issues) so you’ll need to check the operating system before starting the process. Even this still doesn’t work for sqlite3 because Readline itself will respond to control characters. There’s no option to disable this.

There’s a package called esqlite that is also a SQLite front-end. It’s built to use sqlite3 and therefore suffers from all of these problems.

A Custom SQLite Binary

Since sqlite3 proved unreliable I developed my own protocol and external program. It’s just a tiny bit of C that accepts a SQL string and returns results as an s-expression. I’m not longer constrained to storing readable values, but I’m still keeping that paradigm. First, it keeps the C glue program simple and, more importantly, I can rely entirely on the Emacs reader to parse the results. This makes communication between Emacs and the subprocess as fast as it can possibly be. The reader is faster than any possible Elisp program.

As I mentioned before, this C program is compiled when possible, and otherwise a pre-built binary is fetched from my server (popular platforms only, obviously). It’s likely EmacSQL will have at least one working back-end on whatever you’re using.

Other Back-ends

Both PostgreSQL and MySQL are also supported, though these require the user have the appropriate client programs installed (psql or mysql). Both of these are much better behaved than sqlite3 and, with the stty trick, each can reliably be used without any special help. Both pass all of the unit tests, so, in theory, they’ll work just as well as SQLite.

To use them with the example at the beginning of this article, require emacsql-psql or emacsql-mysql, then swap emacsql-connect for the constructors emacsql-psql or emacsql-mysql (along with the proper arguments). All three of these constructors return an emacsql-connection object that works with the same API.

EmacSQL only goes so far to normalize the interfaces to these databases, so for any non-trivial program you may not be able to swap back-ends without some work. All of the EmacSQL functions that operate on connections are generic functions (EIEIO), so changing back-ends will only have an effect on the program’s SQL statements. For example, if you use q SQLite-ism (dynamic typing) it won’t translate to either of the other databases should they be swapped in.

I’ll cover the connections API, and what it takes to implement a new back-end, in a future post. Outside of the PTY caveats, it’s actually very easy. The MySQL implementation is just 80 lines of code.

EmacSQL’s Future

I hope this becomes a reliable and trusted database solution that other packages can depend upon. Twice so far, the pastebin demo and Elfeed, I’ve really wanted something like this and, instead, ended up having to hack together my own database.

I’ve already started a branch on Elfeed re-implementing its database in EmacSQL. Someday it may become Elfeed’s primary database if I feel there’s no disadvantage to it. EmacSQL builds SQLite with the full-text search engine enabled, which opens to the door to a powerful, fast Elfeed search API. Currently the main obstacle is actually Elfeed’s database API being somewhat incompatible with ACID database transactions — shortsightedness on my part!

Emacs Lisp Object Finalizers

2014-01-27T05:24:16Z

*Update: Emacs 25.1 (released Sept. 2016) formally introduced finalizers to Emacs Lisp. This article is left here for historical purposes.

Problem: You have a special resource, such as a buffer or process, associated with an Emacs Lisp object which is not managed by the garbage collector. You want this resource to be cleaned up when the owning lisp object is garbage collected. Unlike some other languages, Elisp doesn’t provide finalizers for this job, so what do you do?

Solution: This is Emacs Lisp. We can just add this feature to the language ourselves!

I’ve already implemented this feature as a package called finalize, available on MELPA. I will be using it as part of a larger, upcoming project.

https://github.com/skeeto/elisp-finalize

In this article I will describe how it works.

Processes and Buffers

Process and buffers are special types of objects. Immediately after instantiation these objects are added to a global list. They will never become unreachable without explicitly being killed. The garbage collector will never manage them for you.

This is a problem for APIs like those provided by the url package. The functions url-retrieve and url-retrieve-synchronously create buffers and hand them back to their callers. Ownership is transfered to the caller and the caller must be careful to kill the buffer, or transfer ownership again, before it returns. Otherwise the buffer is “leaked.” The url package tries to manage this a little bit with url-gc-dead-buffers, but this can’t be relied upon.

Another issue is when a process is started and is stored in a struct or some other kind of object. There is probably a “close” function that accepts one of these structs and kills the process. But if that function isn’t called, due to a bug or an error condition, it will become a “dangling” process. If the struct is completely lost, it will probably be inconvenient to deal with the process — the “close” function is no longer useful.

With Macros

A common way to deal with this problem is using a with- macro. This macro establishes a resource, evaluates a body, and ensures the resource is properly cleaned up regardless of the body’s termination state. The latter is accomplished using unwind-protect. For example, with-temp-buffer,

;; Fetch the first 10 bytes of foo.txt
(with-temp-buffer
  (insert-file-contents "foo.txt" nil 0 10)
  (buffer-string))

This expands (roughly) to the following expression.

(let ((temp-buffer (generate-new-buffer "*temp*")))
  (with-current-buffer temp-buffer
    (unwind-protect
        (progn
          (insert-file-contents "foo.txt" nil 0 10)
          (buffer-string))
      (and (buffer-live-p temp-buffer)
           (kill-buffer temp-buffer)))))

For dealing with open files, Common Lisp has with-open-stream. It establishes a binding for a new stream over its body and ensures the stream is closed when the body is complete. There’s no chance for a stream to be left open, leaking a system resource.

However, with- macros aren’t useful in asynchronous situations. In Emacs this would be the case for asynchronous sub-processes, such as an attached language interpreter. The extent of the process goes beyond a single body.

Finalizers

What would really be useful is to have a callback — a finalizer — that runs when an object is garbage collected. This ensures that the resource will not outlive its owner, restoring management back to the garbage collector. However, Emacs provides no such hook.

Fortunately this feature can be built using weak hash tables and the post-gc-hook, a list of functions that are run immediately after garbage collection.

Weak References

I’ve discussed before how to create weak references in Elisp. The only weak references in Emacs are built into weak hash tables. Normally the language provides weak references first and hash tables are built on top of them. With Emacs we do this backwards.

The make-hash-table function accepts a key argument :weakness to specify how strongly keys and values should be held by the table. To make a weak reference just create a hash table of size 1 and set :weakness to t.

(defun weak-ref (thing)
  (let ((ref (make-hash-table :size 1 :weakness t :test 'eq)))
    (prog1 ref
      (setf (gethash t ref) thing))))

(defun deref (ref)
  (gethash t ref))

The same trick can be used to detect when an object is garbage collected. If the result of deref is nil, then the object was garbage collected. (Or the weakly-referenced object is nil, but this object will never be garbage collected anyway.)

To check if we need to run a finalizer all we have to do is create a weak reference to the object, then check the reference after garbage collection. This check can be done in a post-gc-hook function.

Registration

To avoid cluttering up post-gc-hook with one closure per object we’ll keep a register of all watched objects.

(defvar finalizable-objects ())

(defun register (object callback)
  (push (cons (weak-ref object) callback) finalizable-objects))

Now a function to check for missing objects, try-finalize.

(defun try-finalize ()
  (let ((alive (cl-remove-if-not #'deref finalizable-objects :key #'car))
        (dead (cl-remove-if #'deref finalizable-objects :key #'car)))
    (setf finalizable-objects alive)
    (mapc #'funcall (mapcar #'cdr dead))))

(add-hook 'post-gc-hook #'try-finalize)

Now to try it out. Create a process, stuff it in a vector (like a defstruct), register delete-process as a finalizer, and, for the sake of demonstration, immediately forget the vector.

;;; -*- lexical-binding: t; -*-
(let ((process (start-process "ping" nil "ping" "localhost")))
  (register (vector process) (lambda () (delete-process process))))

;; Assuming the garbage collector has not already run.
(get-process "ping")
;; => #

;; Force garbage collection.
(garbage-collect)

(get-process "ping")
;; => nil

The garbage collector killed the process for us!

There are some problems with this implementation. Using cl-remove-if is unwise in a post-gc-hook function. It allocates lots of new cons cells but garbage collection is inhibited while the function is run. The docstring warns us:

Garbage collection is inhibited while the hook functions run, so be careful writing them.

Similarly, all of the finalizers are run within the context of this memory-sensitive hook. Instead they should be delayed until the next evaluation turn (i.e. run-at-time of 0). Some of the finalizers could also fail, which would cause the remaining finalizers to never run. The real implementation deals with all of these issues.

A major drawback to these Emacs Lisp finalizers compared to other languages is that the actual object is not available. We don’t know it’s getting collected until after it’s already gone. This solves the object resurrection problem, but it’s darn inconvenient. One possible workaround in the case of defstructs and EIEIO objects is to make a copy of the original object (copy-sequence or clone) and run the finalizer on the copy as if it was the original.

The Real Implementation

The real implementation is more carefully namespaced and its API has just one function: finalize-register. It works just like register above but it accepts &rest arguments to be passed to the finalizer. This makes the registration call simpler and avoids some significant problems with closures.

(let ((process (start-process "ping" nil "ping" "localhost")))
  (finalize-register (vector process) #'delete-process process))

Here’s a more formal example of how it might really be used.

(cl-defstruct (pinger (:constructor pinger--create))
  process host)

(defun pinger-create (host)
  (let* ((process (start-process "pinger" nil "ping" host))
         (object (pinger--create :process process :host host)))
    (finalize-register object #'delete-process process)
    object))

To make things cleaner for EIEIO classes there’s also a finalizable mixin class that ensures the finalize generic function is called on a copy of the object (the original object is gone) when it’s garbage collected.

Here’s how it would be used for the same “pinger” concept, this time as an EIEIO class. An advantage here is that anyone can manually call finalize early if desired.

(require 'eieio)
(require 'finalizable)

(defclass pinger (finalizable)
  ((process :initarg :process :reader pinger-process)
   (host :initarg :host :reader pinger-host)))

(defun pinger-create (host)
  (make-instance 'pinger
                 :process (start-process "ping" nil "ping" host)
                 :host host))

(defmethod finalize ((pinger pinger))
  (delete-process (pinger-process pinger)))

It’s a small package but I think it can be quite handy.

Measure Elisp Object Memory Usage with Calipers

2014-01-26T01:15:02Z

A couple of weeks ago I wrote a library to measure the retained memory footprint of arbitrary Elisp objects for the purposes of optimization. It’s called Caliper.

https://github.com/skeeto/caliper

Note, Caliper requires predd, my predicate dispatch library. Neither of these packages are on MELPA or Marmalade since they’re mostly for fun.

The reason I wanted this was that I came across a post on reddit where someone had scraped 217,000 Jeopardy! questions from J! Archive and dumped them out into a single, large JSON file. The significance of the effort is that it dealt with some of the inconsistencies of J! Archive’s data presentation, normalizing them for the JSON output.

JEOPARDY_QUESTIONS1.json.gz (12MB, 53MB uncompressed)

When I want to examine a JSON dataset like this I have three preferred options:

Load it into a browser page and poke at it from JavaScript remotely with Skewer. With the JSON text weighing in at 53MB and with such a large object count, I decided this was too large for a browser page. It definitely could be done, it’s just that the browser is not the place to be working on large datasets.
Load it into Clojure. I’m familiar with Clojure’s data.json. This is not a bad choice, but there’s something else I always reach for first if I can.
Load it into Emacs using json.el (part of Emacs). This is what I ended up doing.

(defvar jeopardy
  (with-temp-buffer
    (insert-file-contents "/tmp/JEOPARDY_QUESTIONS1.json")
    (json-read)))

(length jeopardy)
;; => 216930

Here, jeopardy is bound to a vector of 216,930 association lists (alists). I’m curious exactly how much heap memory this data structure is using. To find out, we need to walk the data structure and sum the sizes of everything we come across. However, care must be taken not to count the identical objects twice, such as symbols, which, being interned, appear many times in this data.

Measuring Object Sizes

This is lisp so let’s start with the cons cell. A cons cell is just a pair of pointers, called car and cdr.

These are used to assemble lists.

So a cons cell itself — the shallow size — is two words: 16 bytes on a 64-bit operating system. To make sure Elisp doesn’t happen to have any additional information attached to cons cells, let’s take a look at the Emacs source code.

struct Lisp_Cons
  {
    /* Car of this cons cell.  */
    Lisp_Object car;

    union
    {
      /* Cdr of this cons cell.  */
      Lisp_Object cdr;

      /* Used to chain conses on a free list.  */
      struct Lisp_Cons *chain;
    } u;
  };

The return value from garbage-collect backs this up. The first value after each type is the shallow size of that type. From here on, all values have been computed for 64-bit Emacs running on x86-64 GNU/Linux.

(garbage-collect)
;; => ((conses 16 9923172 2036943)
;;     (symbols 48 57017 54)
;;     (miscs 40 10203 18892)
;;     (strings 32 4810027 197961)
;;     (string-bytes 1 104599635)
;;     (vectors 16 103138)
;;     (vector-slots 8 2921744 131076)
;;     (floats 8 12494 5816)
;;     (intervals 56 119911 69249)
;;     (buffers 960 134)
;;     (heap 1024 593412 133853))

A Lisp_Object is just a pointer to a lisp object. The retained size of a cons cell is its shallow size plus, recursively, the retained size of the objects in its car and cdr.

Integers and Floats

Integers are a special case. Elisp uses what is called tagged integers. They’re not heap-allocated objects. Instead they’re embedded inside the object pointers. That is, those Lisp_Object pointers in Lisp_Cons will hold integers directly. This means to Caliper integers have retained size of 0. We can use this to verify Caliper’s return value for cons cells.

(caliper-object-size 100)
;; => 0

(caliper-object-size (cons 100 200))
;; => 16

Tagged integers are fast and save on memory. They also compare properly with eq, which is just a pointer (identity) comparison. However, because a few bits need to be reserved for differentiating them from actual pointers these integers have a restricted dynamic range.

Floats are not tagged and exist as immutable objects in the heap. That’s why eql is still useful in Elisp — it’s like eq but will handle numbers properly. (By convention you should use eql for integers, too.)

Symbols and Strings

Not counting the string’s contents, a string’s base size is 32 bytes according to garbage-collect. The length of the string can’t be used here because that counts characters, which vary in size. There’s a string-bytes function for this. A string’s size is 32 plus its string-bytes value.

(string-bytes "naïveté")
;; => 9
(caliper-object-size "naïveté")
;; => 41  (i.e. 32 + 9)

As you can see from above, symbols are huge. Without even counting either the string holding the name of the symbol or the symbol’s plist, a symbol is 48 bytes.

(caliper-object-size 'hello)
;; => 1038

This 1,038 bytes is a little misleading. The symbol itself is 48 bytes, the string "hello" is 37 bytes, and the plist is nil. The retained size of nil is significant. On my system, nil’s plist has 4 key-value pairs, which themselves have retained sizes. When examining symbols, caliper doesn’t care if they’re interned or not, including symbols like nil and t. However, nil is only counted once, so it will have little impact on a large data structure.

Miscellaneous

Outside of vectors, measuring object sizes starts to get fuzzy. For example, it’s not possible to examine the exact internals of a hash table from Elisp. We can see its contents and the number of elements it can hold without re-sizing, but there’s intermediate structure that’s not visible. Caliper makes rough estimates for each of these types.

Circularity and Double Counting

To avoid double counting objects, a hash table with a test of eq is dynamically bound by the top level call. It’s used like a set. Before an object is examined, the hash table is checked. If the object is listed, the reported size is 0 (it consumes no additional space than already accounted for).

This automatically solves the circularity problem. There’s no way we can traverse into the same data structure a second time because we’ll stop when we see it twice.

Using Caliper

So what’s the total retained size of the jeopardy structure? About 124MB.

(caliper-object-size jeopardy)
;; => 130430198

For fun, let’s see if how much we can improve on this.

json.el will return alists for objects by default, but this can be changed by setting json-object-type to something else. Initially I thought maybe using plists instead would save space, but I later realized that plists use exactly the same number of cons cells as alists. If this doesn’t sound right, try to picture the cons cells in your head (an exercise for the reader).

(defvar jeopardy
  (let ((json-object-type 'plist))
    (with-temp-buffer
      (insert-file-contents "~/JEOPARDY_QUESTIONS1.json")
      (setf (point) (point-min))
      (json-read))))

(caliper-object-size jeopardy)
;; => 130430077 (plist)

Strangely this is 121 bytes smaller. I don’t know why yet, but in the scope of 124MB that’s nothing.

So what do these questions look like?

(elt jeopardy 0)
;; => (:show_number "4680"
;;     :round "Jeopardy!"
;;     :answer "Copernicus"
;;     :value "$200"
;;     :question "..." ;; omitted
;;     :air_date "2004-12-31"
;;     :category "HISTORY")

They’re (now) plists of 7 pairs. All of the keys are symbols, and, as such, are interned and consuming very little memory. All of the values are strings. Surely we can do better here. The strings can be interned and the numbers can be turned into tagged integers. The :category values would probably be good candidates for conversion into symbols.

Here’s an interesting fact about Jeopardy! that can be exploited for our purposes. While Jeopardy! covers a broad range of trivia, it does so very shallowly. The same answers appear many times. For example, the very first answer from our dataset, Copernicus, appears 14 times. That makes even the answers good candidates for interning.

(cl-loop for question across jeopardy
         for answer = (plist-get question :answer)
         count (string= answer "Copernicus"))
;; => 14

A string pool is trivial to implement. Just use a weak, equal hash table to track strings. Making it weak keeps it from leaking memory by holding onto strings for longer than necessary.

(defvar string-pool
  (make-hash-table :test 'equal :weakness t))

(defun intern-string (string)
  (or (gethash string string-pool)
      (setf (gethash string string-pool) string)))

(defun jeopardy-fix (question)
  (cl-loop for (key value) on question by #'cddr
           collect key
           collect (cl-case key
                     (:show_number (read value))
                     (:value (if value (read (substring value 1))))
                     (:category (intern value))
                     (otherwise (intern-string value)))))

(defvar jeopardy-interned
  (cl-map 'vector #'jeopardy-fix jeopardy))

So how are we looking now?

(caliper-object-size jeopardy-interned)
;; => 83254322

That’s down to 79MB of memory. Not bad! If we print-circle this, taking advantage of string interning in the printed representation, I wonder how it compares to the original JSON.

(with-temp-buffer
  (let ((print-circle nil))
    (prin1 jeopardy-interned (current-buffer))
    (buffer-size)))
;; => 45554437

About 44MB, down from JSON’s 53MB. With print-circle set to nil it’s about 48MB.

Emacs Byte-code Internals

2014-01-04T05:07:26Z

Byte-code compilation is an underdocumented — and in the case of the recent lexical binding updates, undocumented — part of Emacs. Most users know that Elisp is usually compiled into a byte-code saved to .elc files, and that byte-code loads and runs faster than uncompiled Elisp. That’s all users really need to know, and the GNU Emacs Lisp Reference Manual specifically discourages poking around too much.

People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity.

Screw that! What if I want to handcraft some byte-code myself? :-) The purpose of this article is to introduce the internals of Elisp byte-code interpreter. I will explain how it works, why lexically scoped code is faster, and demonstrate writing some byte-code by hand.

The Humble Stack Machine

The byte-code interpreter is a simple stack machine. The stack holds arbitrary lisp objects. The interpreter is backwards compatible but not forwards compatible (old versions can’t run new byte-code). Each instruction is between 1 and 3 bytes. The first byte is the opcode and the second and third bytes are either a single operand or a single intermediate value. Some operands are packed into the opcode byte.

As of this writing (Emacs 24.3) there are 142 opcodes, 6 of which have been declared obsolete. Most opcodes refer to commonly used built-in functions for fast access. (Looking at the selection, Elisp really is geared towards text!) Considering packed operands, there are up to 27 potential opcodes unused, reserved for the future.

opcodes 48 - 55
opcode 97
opcode 128
opcodes 169 - 174
opcodes 180 - 181
opcodes 183 - 191

The easiest place to access the opcode listing is in bytecomp.el. Beware that some of the opcode comments are currently out of date.

Segmentation Fault Warning

Byte-code does not offer the same safety as normal Elisp. Bad byte-code can, and will, cause Emacs to crash. You can try out for yourself right now,

emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'

Or evaluate the code manually in a buffer (save everything first!),

(#[0 "\300\207" [] 0])

This segfault, caused by referencing beyond the end of the constants vector, is not an Emacs bug. Doing a boundary test would slow down the byte-code interpreter. Not performing this test at run-time is a practical engineering decision. The Emacs developers have instead chosen to rely on valid byte-code output from the compiler, making a disclaimer to anyone wanting to write their own byte-code,

You should not try to come up with the elements for a byte-code function yourself, because if they are inconsistent, Emacs may crash when you call the function. Always leave it to the byte compiler to create these objects; it makes the elements consistent (we hope).

You’ve been warned. Now it’s time to start playing with firecrackers.

The Byte-code Object

A byte-code object is functionally equivalent to a normal Elisp vector except that it can be evaluated as a function. Elements are accessed in constant time, the syntax is similar to vector syntax ([...] vs. #[...]), and it can be of any length, though valid functions must have at least 4 elements.

There are two ways to create a byte-code object: using a byte-code object literal or with make-byte-code. Like vector literals, byte-code literals don’t need to be quoted.

(make-byte-code 0 "" [] 0)
;; => #[0 "" [] 0]

#[1 2 3 4]
;; => #[1 2 3 4]

(#[0 "" [] 0])
;; error: Invalid byte opcode

The elements of an object literal are:

Function parameter (lambda) list
Unibyte string of byte-code
Constants vector
Maximum stack usage
Docstring (optional, nil for none)
Interactive specification (optional)

Parameter List

The parameter list takes on two different forms depending on if the function is lexically or dynamically scoped. If the function is dynamically scoped, the argument list is exactly what appears in lisp code.

(byte-compile (lambda (a b &optional c)))
;; => #[(a b &optional c) "\300\207" [nil] 1]

There’s really no shorter way to represent the parameter list because preserving the argument names is critical. Remember that, in dynamic scope, while the function body is being evaluated these variables are globally bound (eww!) to the function’s arguments.

When the function is lexically scoped, the parameter list is packed into an Elisp integer, indicating the counts of the different kinds of parameters: required, &optional, and &rest.

The least significant 7 bits indicate the number of required arguments. Notice that this limits compiled, lexically-scoped functions to 127 required arguments. The 8th bit is the number of &rest arguments (up to 1). The remaining bits indicate the total number of optional and required arguments (not counting &rest). It’s really easy to parse these in your head when viewed as hexadecimal because each portion almost always fits inside its own “digit.”

(byte-compile-make-args-desc '())
;; => #x000  (0 args, 0 rest, 0 required)

(byte-compile-make-args-desc '(a b))
;; => #x202  (2 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c))
;; => #x302  (3 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c &rest d))
;; => #x382  (3 args, 1 rest, 2 required)

The names of the arguments don’t matter in lexical scope: they’re purely positional. This tighter argument specification is one of the reasons lexical scope is faster: the byte-code interpreter doesn’t need to parse the entire lambda list and assign all of the variables on each function invocation.

Unibyte String Byte-code

The second element is a unibyte string — it strictly holds octets and is not to be interpreted as any sort of Unicode encoding. These strings should be created with unibyte-string because string may return a multibyte string. To disambiguate the string type to the lisp reader when higher values are present (> 127), the strings are printed in an escaped octal notation, keeping the string literal inside the ASCII character set.

(unibyte-string 100 200 250)
;; => "d\310\372"

It’s unusual to see a byte-code string that doesn’t end with 135 (#o207, byte-return). Perhaps this should have been implicit? I’ll talk more about the byte-code below.

Constants Vector

The byte-code has very limited operands. Most operands are only a few bits, some fill an entire byte, and occasionally two bytes. The meat of the function that holds all the constants, function symbols, and variables symbols is the constants vector. It’s a normal Elisp vector and can be created with vector or a vector literal. Operands reference either this vector or they index into the stack itself.

(byte-compile (lambda (a b) (my-func b a)))
;; => #[(a b) "\302\134\011\042\207" [b a my-func] 3]

Note that the constants vector lists the variable symbols as well as the external function symbol. If this was a lexically scoped function the constants vector wouldn’t have the variables listed, being only [my-func].

Maximum Stack Usage

This is the maximum stack space used by this byte-code. This value can be derived from the byte-code itself, but it’s pre-computed so that the byte-code interpreter can quickly check for stack overflow. Under-reporting this value is probably another way to crash Emacs.

Docstring

The simplest component and completely optional. It’s either the docstring itself, or if the docstring is especially large it’s a cons cell indicating a compiled .elc and a position for lazy access. Only one position, the start, is needed because the lisp reader is used to load it and it knows how to recognize the end.

Interactive Specification

If this element is present and non-nil then the function is an interactive function. It holds the exactly contents of interactive in the uncompiled function definition.

(byte-compile (lambda (n) (interactive "nNumber: ") n))
;; => #[(n) "\010\207" [n] 1 nil "nNumber: "]

(byte-compile (lambda (n) (interactive (list (read))) n))
;; => #[(n) "\010\207" [n] 1 nil (list (read))]

The interactive expression is always interpreted, never byte-compiled. This is usually fine because, by definition, this code is going to be waiting on user input. However, it slows down keyboard macro playback.

Opcodes

The bulk of the established opcode bytes is for variable, stack, and constant access opcodes, most of which use packed operands.

0 - 7 : (stack-ref) stack reference
8 - 15 : (varref) variable reference (from constants vector)
16 - 23 : (varset) variable set (from constants vector)
24 - 31 : (varbind) variable binding (from constants vector)
32 - 39 : (call) function call (immediate = number of arguments)
40 - 47 : (unbind) variable unbinding (from constants vector)
129, 192-255 : (constant) direct constants vector access

Except for the last item, each kind of instruction comes in sets of 8. The nth such instruction means access the nth thing. For example, the instruction “2” copies the third stack item to the top of the stack. An instruction of “9” pushes onto the stack the value of the variable named by the second element listed in the constants vector.

However, the 7th and 8th such instructions in each set take an operand byte or two. The 7th instruction takes a 1-byte operand and the 8th takes a 2-byte operand. A 2-byte operand is written in little-endian byte-order regardless of the host platform.

For example, let’s manually craft an instruction that returns the value of the global variable foo. Each opcode has a named constant of byte-X so we don’t have to worry about their actual byte-code number.

(require 'bytecomp)  ; named opcodes

(defvar foo "hello")

(defalias 'get-foo
  (make-byte-code
    #x000                 ; no arguments
    (unibyte-string
      (+ 0 byte-varref)   ; ref variable under first constant
      byte-return)        ; pop and return
    [foo]                 ; constants
    1))                   ; only using 1 stack space

(get-foo)
;; => "hello"

Ta-da! That’s a handcrafted byte-code function. I left a “+ 0” in there so that I can change the offset. This function has the exact same behavior, it’s just less optimal,

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 3 byte-varref)     ; 4th form of varref
      byte-return)
    [nil nil nil foo]
    1))

If foo was the 10th constant, we would need to use the 1-byte operand version. Again, the same behavior, just less optimal.

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 6 byte-varref)     ; 7th form of varref
      9                     ; operand, (constant index 9)
      byte-return)
    [nil nil nil nil nil nil nil nil nil foo]
    1))

Dynamically-scoped code makes heavy use of varref but lexically-scoped code rarely uses it (global variables only), instead relying heavily on stack-ref, which is faster. This is where the different calling conventions come into play.

Calling Convention

Each kind of scope gets its own calling convention. Here we finally get to glimpse some of the really great work by Stefan Monnier updating the compiler for lexical scope.

Dynamic Scope Calling Convention

Remembering back to the parameter list element of the byte-code object, dynamically scoped functions keep track of all its argument names. Before executing a function the interpreter examines the lambda list and binds (varbind) every variable globally to an argument.

If the caller was byte-compiled, each argument started on the stack, was popped and bound to a variable, and, to be accessed by the function, will be pushed back right onto the stack (varref). There’s a lot of argument indirection for each function call.

Lexical Scope Calling Convention

With lexical scope, the argument names are not actually bound for the evaluation byte-code. The names are completely gone because the compiler has converted local variables into stack offsets.

When calling a lexically-scoped function, the byte-code interpreter examines the integer parameter descriptor. It checks to make sure the appropriate number of arguments have been provided, and for each unprovided &optional argument it pushes a nil onto the stack. If the function has a &rest parameter, any extra arguments are popped off into a list and that list is pushed onto the stack.

From here the function can access its arguments directly on the stack without any named variable misdirection. It can even consume them directly.

;; -*- lexical-binding: t -*-
(defun foo (x) x)

(symbol-function #'foo)
;; => #[#x101 "\207" [] 2]

The byte-code for foo is a single instruction: return. The function’s argument is already on the stack so it doesn’t have to do anything. Strangely the maximum stack usage element is wrong here (2), but it won’t cause a crash.

;; (As of this writing `byte-compile' always uses dynamic scope.)

(byte-compile 'foo)
;; => #[(x) "\010\207" [x] 1]

It takes longer to set up (x is implicitly bound), it has to make an explicit variable dereference (varref), then it has to clean up by unbinding x (implicit unbind). It’s no wonder lexical scope is faster!

Note that there’s also a disassemble function for examining byte-code, but it only reveals part of the story.

(disassemble #'foo)
;; byte code:
;;   args: (x)
;; 0       varref    x
;; 1       return

Compiler Intermediate “lapcode”

The Elisp byte-compiler has an intermediate language called lapcode (“Lisp Assembly Program”), which is much easier to optimize than byte-code. It’s basically an assembly language built out of s-expressions. Opcodes are referenced by name and operands, including packed operands, are handled whole. Each instruction is a cons cell, (opcode . operand), and a program is a list of these.

Let’s rewrite our last get-foo using lapcode.

(defalias 'get-foo
  (make-byte-code
    #x000
    (byte-compile-lapcode
      '((byte-varref . 9)
        (byte-return)))
    [nil nil nil nil nil nil nil nil nil foo]
    1))

We didn’t have to worry about which form of varref we were using or even how to encode a 2-byte operand. The lapcode “assembler” took care of that detail.

Project Ideas?

The Emacs byte-code compiler and interpreter are fascinating. Having spent time studying them I’m really tempted to build a project on top of it all. Perhaps implementing a programming language that targets the byte-code interpreter, improving compiler optimization, or, for a really big project, JIT compiling Emacs byte-code.

People can write byte-code!

Emacs Lisp Readable Closures

2013-12-30T23:52:38Z

I’ve stated before that one of the unique features of Emacs Lisp is that its closures are readable. Closures can be serialized by the printer and read back in with the reader. I am unaware of any other programming language that has this feature. In fact it’s essential for Elisp byte-code compilation because byte-compiled Elisp files are merely s-expressions of byte-code dumped out as source.

Lisp Printing

The Lisp family of languages are homoiconic. Lisp source code is written in the syntax of its own data structures, s-expressions. Since a compiler/interpreter is usually provided at run-time, a consequence of this is that reading and printing are a fundamental feature of Lisps. A value can be handed to the printer, which will serialize the value into an s-expression as a sequence of characters. Later on the reader can parse the s-expression back into an equal value.

To compare, JavaScript originally had half of this in place. JavaScript has convenient object syntax for defining an associative array, known today as JSON. The eval function could (dangerously) be used as a reader for parsing a string containing JSON-encoded data into a value. But until JSON.stringify() became standard, developers had to write their own printer. Lisp s-expression syntax is much more powerful (and complicated) than JSON, maintaining both identity and cycles (e.g. *print-circle*).

Not all values can be read. They’ll still print (when *print-readably* is nil) but will do so using special syntax that will signal an error in the reader: #<. For example, in Emacs Lisp buffers cannot be serialized so they print using this syntax.

(prin1-to-string (current-buffer))
;; => "#"

It doesn’t matter what’s between the angle brackets, or even that there’s a closing angle bracket. The reader will signal an error as soon as it hits a #<.

Almost Everything Prints Readably

Elisp has a small set of primitive data types. All of these primitive types print readably:

integer (1024, ?a)
float (1.7)
cons/list ((...))
vector (one-dimensional, [...])
bool-vector (#&n"...")
string ("...")
char-table (#^[...])
hash-table (readable as of Emacs 23.3, #s(hash-table ...))
byte-code function object (#[...])
symbol

Here are all the non-readable types. Each one has a good reason for not being serializable.

buffer
process (external state)
frame (user interface element)
marker (live, automatically updates)
overlay (belongs to a buffer)
built-in functions (native code)
user-ptr (opaque pointers from Emacs 25 dynamic modules)

And that’s it. Every other value in Elisp is constructed from one or more of these primitives, including keymaps, functions, macros, syntax tables, defstruct structs, and EIEIO objects. This means that as long as these values don’t refer to an unreadable value, they themselves can be printed.

An interesting note here is that, unlike the Common Lisp Object System (CLOS), EIEIO objects are readable by default. To Elisp they’re just vectors, so of course they print. CLOS objects are unreadable without manually defining a print method per class.

Elisp Closures

Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now one of the relatively few languages to have both dynamic and lexical scope. Like Common Lisp, variables declared with defvar (and family) continue to have dynamic scope. For backwards compatibility with old Lisp code, lexical scope is disabled by default. It’s enabled for a specific file or buffer by setting lexical-binding to non-nil.

With lexical scope, anonymous functions become closures, a powerful functional programming primitive: a function plus a captured lexical environment. It also provides some performance benefits. In my own tests, compiled Elisp with lexical scope enabled is about 10% to 15% faster than with the default dynamic scope.

What do closures look like in Emacs Lisp? It takes on two forms depending on whether the closure is compiled or not. For example, consider this function, foo, that takes two arguments and returns a closure that returns the first argument.

;; -*- lexical-binding: t; -*-
(defun foo (x y)
  (lambda () x))

(foo :bar :ignored)
;; => (closure ((y . :ignored) (x . :bar) t) () x)

An uncompiled closure is a list beginning with the symbol closure. The second element is the lexical environment, the third is the argument list (lambda list), and the rest is the body of the function. Here we can see that both x and y have been “closed over.” This is a little bit sloppy because the function never makes use of y. Capturing it has a few problems.

The closure has a larger footprint than necessary.
Values are held longer than necessary, delaying collection.
It affects the readability of the closure, which I’ll get to later.

Fortunately the compiler is smart enough to see this and will avoid capturing unused variables. To prove this, I’ve now compiled foo so that it returns a compiled closure.

(foo :bar :ignored)
;; => #[0 "\300\207" [:bar] 1]

What’s returned here is a byte-code function object, with the #[...] syntax. It has these elements:

The function’s lambda list (zero arguments)
Byte-codes stored in a unibyte string
Constants vector
Maximum stack space needed by this function

Notice that the lexical environment has been captured in the constants vector, specifically noting the lack of :ignored in this vector. The compiler didn’t capture it.

For those curious about the byte-code here’s an explanation. The string syntax shown is in octal, representing a string containing two bytes: 192 and 135. The Elisp byte-code interpreter is stack-based. The 192 (constant 0) says to push the first constant onto the stack. The 135 (return) says to pop the top element from the stack and return it.

(coerce "\300\207" 'list)
;; => (192 135)

The Readable Closures Catch

Since closures are byte-code function objects, they print readably. You can capture an environment in a closure, serialize it, read it back in, and evaluate it. That’s pretty cool! This means closures can be transmitted to other Emacs instances in a multi-processing setup (i.e. Elnode, Async)

The catch is that it’s easy to accidentally capture an unreadable value, especially buffers. Consider this function bar which uses a temporary buffer as an efficient string builder. It returns a closure that returns the result. (Weird, but stick with me here!)

(defun bar (n)
  (with-temp-buffer
    (let ((standard-output (current-buffer)))
      (loop for i from 0 to n do (princ i))
      (let ((string (buffer-string)))
        (lambda () string)))))

The compiled form looks fine,

(bar 3)
;; => #[0 "\300\207" ["0123"] 1]

But the interpreted form of the closure has a problem. The with-temp-buffer macro silently introduced a new binding — an abstraction leak.

(bar 3)
;; => (closure ((string . "0123")
;;              (temp-buffer . #)
;;              (n . 3) t)
;;      () string)

The temporary buffer is mistakenly captured in the closure making it unreadable, but only in its uncompiled form. This creates the awkward situation where compiled and uncompiled code has different behavior.

Clojure-style Multimethods in Emacs Lisp

2013-12-18T23:06:15Z

This past week I added Clojure-style multimethods to Emacs Lisp through a package I call predd (predicate dispatch). I believe it is Elisp’s very first complete multiple dispatch object system! That is, methods are dispatched based on the dynamic, run-time type of more than one of its arguments.

https://github.com/skeeto/predd

(Unfortunately I was unaware of the other Clojure-style multimethod library when I wrote mine. However, my version is much more complete, has better performance, and is public domain.)

As of version 23.2, Emacs includes a CLOS-like object system cleverly named EIEIO. While CLOS (Common Lisp Object System) is multiple dispatch, EIEIO is, like most object systems, only single dispatch. The predd package is also very different than my other Elisp object system, @, which was prototype based and, therefore, also single dispatch (and comically slow).

The Clojure multimethods documentation provides a good introduction. The predd package works almost exactly the same way, except that due to Elisp’s lack of namespacing the function names are prefixed with predd-. Also different is that the optional hierarchy (h) argument is handled by the dynamic variable predd-hierarchy, which holds the global hierarchy.

Combination Example

To define a multimethod, pick a name and give it a classifier function. The classifier function will look at the method’s arguments and return a dispatch value. This value is used to select a particular method. What makes predd a multiple dispatch system is the dispatch value can be derived from any number of methods arguments. Because the dispatch value is computed at run-time this is called a late binding.

Here I’m going to define a multimethod called combine that takes two arguments. It combines its arguments appropriately depending on their dynamic run-time types.

(predd-defmulti combine (lambda (a b) (vector (type-of a) (type-of b)))
  "Appropriately combine A and B.")

The classifier uses type-of, an Elisp built-in, to examine its argument types. It returns them as tuple in the form of a vector. The classifier of a method can be accessed with predd-classifier, which I’ll use to demonstrate what these dispatch values will look like.

(funcall (predd-classifier 'combine) 1 2)    ; => [integer integer]
(funcall (predd-classifier 'combine) 1 "2")  ; => [integer string]

I chose a vector for the dispatch value because I like the bracket style when defining methods (you’ll see below). The dispatch value can be literally anything that equal knows how to compare, not just vectors. Note that it’s actually faster to create a list than a vector up to a length of about 6, so this multimethod would be faster if the classifier returned a list — or even better: a single cons.

Now define some methods for different dispatch values.

(predd-defmethod combine [integer integer] (a b)
  (+ a b))

(predd-defmethod combine [string string] (a b)
  (concat a b))

(predd-defmethod combine [cons cons] (a b)
  (append a b))

Now try it out.

(combine 1 2)            ; => 3
(combine "a" "b")        ; =>"ab"
(combine '(1 2) '(3 4))  ; => (1 2 3 4)

(combine 1 '(3 4))
; error: "No method found in combine for [integer cons]"

Notice in the last case it didn’t know how to combine these two types, so it threw an error. In this simple example where we’re only calling a single function, so rather than use the predd-defmethod macro these methods can be added directly with the predd-add-method function. This has the exact same result except that it has slightly better performance (no wrapper functions).

(predd-add-method 'combine [integer integer] #'+)
(predd-add-method 'combine [string string]   #'concat)
(predd-add-method 'combine [cons cons]       #'append)

Use the Hierarchy

Hmmm, the + function is already polymorphic. It seamlessly operates on both floats and integers. So far it seems there’s no way to exploit this with multimethods. Fortunately we can solve this by defining our own ad hoc hierarchy using predd-derive. Both integers and floats are a kind of number. It’s important to note that type-of never returns number. We’re introducing that name here ourselves.

(type-of 1.0)  ; => float

(predd-derive 'integer 'number)
(predd-derive 'float 'number)

;; Types can derive from multiple parents, like multiple inheritance
(predd-derive 'integer 'exact)
(predd-derive 'float 'inexact)

This says that integer and float are each a kind of number. Now we can use number in a dispatch value. When it sees something like [float integer] it knows that it matches [number number].

(predd-add-method 'combine [number number] #'+)

(combine 1.5 2)  ; => 3.5

We can check the hierarchy explicitly with predd-isa-p (like Clojure’s isa?). It compares two values just like equal, but it also accounts for all predd-derive declarations. Because of this extra concern, unlike equal, predd-isa-p is not commutative.

(predd-isa-p 'number 'number)  ; => 0
(predd-isa-p 'float 'number)   ; => 1
(predd-isa-p 'number 'float)   ; => nil

(predd-isa-p [float float] [number number])  ; => 2

(Remember that 0 is truthy in Elisp.) The integer returned is a distance metric used by method dispatch to determine which values are “closer” so that the most appropriate method is selected.

You might be worried that introducing number will make the multimethod slower. Examining the hierarchy will definitely have a cost after all. Fortunately predd has a dispatch cache, so introducing this indirection will have no additional performance penalty after the first call with a particular dispatch value.

Struct Example

Something that really sets these multimethods apart from other object systems is a lack of concern about encapsulation — or really about object data in general. That’s the classifier’s concern. So here’s an example of how to combine predd with defstruct from cl/cl-lib.

Imagine we’re making some kind of game where each of the creatures is represented by an actor struct. Each actor has a name, hit points, and active status effects.

(defstruct actor
  (name "Unknown")
  (hp 100)
  (statuses ()))

The defstruct macro has a useful inheritance feature that we can exploit for our game to create subtypes. The parent accessors will work on these subtypes, immediately providing some (efficient) polymorphism even before multimethods are involved.

(defstruct (player (:include actor))
  control-scheme)

(defstruct (stinkmonster (:include actor))
  (type 'sewage))

(actor-hp (make-stinkmonster))  ; => 100

As a side note: this isn’t necessarily the best way to go about modeling a game. We probably shouldn’t be relying on inheritance too much, but bear with me for this example.

Say we want an attack method for handling attacks between different types of monsters. Elisp structs have a very useful property by default: they’re simply vectors whose first element is a symbol denoting its type. We can use this in a multimethod classifier.

(make-player)
;; => [cl-struct-player "Unknown" 100 nil nil]

(predd-defmulti attack
    (lambda (attacker victim)
      (vector (aref attacker 0) (aref victim 0)))
  "Perform an attack from ATTACKER on VICTIM.")

Let’s define a base case. This will be overridden by more specific methods (determined by that distance metric).

(predd-defmethod attack [cl-struct-actor cl-struct-actor] (a v)
  (decf (actor-hp v) 10))

We could have instead used :default for the dispatch value, which is a special catch-all value. The actor-hp function will signal an error for any victim non-actors anyway. However, not using :default will force both argument types to be checked. It will also demonstrate specialization for the example.

However, before we can make use of this we need to teach predd about the relationship between these structs. It doesn’t check defstruct hierarchies. This step is what makes combining defstruct and predd a little unwieldy. A wrapper macro is probably due for this.

(predd-derive 'cl-struct-player 'cl-struct-actor)
(predd-derive 'cl-struct-stinkmonster 'cl-struct-actor)

(let ((player (make-player))
      (monster (make-stinkmonster)))
  (attack player monster)
  (actor-hp monster))
;; => 90

When the stinkmonster attacks players it doesn’t do damage. Instead it applies a status effect.

(predd-defmethod attack [cl-struct-stinkmonster cl-struct-player] (a v)
  (pushnew (stinkmonster-type a) (actor-statuses v)))

(let ((player (make-player))
      (monster (make-stinkmonster)))
  (attack monster player)
  (actor-statuses player))
;; => (sewage)

If the monster applied a status effect in addition to the default attack behavior then CLOS-style method combination would be far more appropriate here (if only it was available in Elisp). The method would instead be defined as an “after” method and it would automatically run in addition to the default behavior.

If I was actually building a system combing structs and predd, I would be using this helper function for building classifiers. It returns a dispatch value for selected arguments.

;;; -*- lexical-binding: t; -*-

(defun struct-classifier (&rest pattern)
  (lambda (&rest args)
    (loop for select-p in pattern and arg in args
          when select-p collect (elt arg 0))))

;; Takes 3 arguments, dispatches on the first 2 argument types.
(predd-defmulti speak (struct-classifier t t nil))

;; Messages sent to the player are displayed.
(predd-defmethod speak '(cl-struct-actor cl-struct-player) (from to message)
  (message "%s says %s." (actor-name from) message))

The Future

As of this writing there isn’t yet a prefer-method for disambiguating equally preferred dispatch values. I will add it in the future. I think prefer-method gets unwieldy quickly as the type hierarchy grows, so it should be avoided anyway.

I haven’t put predd in MELPA or otherwise published it yet. That’s what this post is for. But I think it’s ready for prime time, so feel free to try it out.

Emacs Lisp Reddit API Wrapper

2013-12-16T23:27:23Z

A couple of months ago I wrote an Emacs Lisp wrapper for the reddit API. I didn’t put it in MELPA, not yet anyway. If anyone is finding it useful I’ll see about getting that done. My intention was give it some exercise and testing before putting it out there for people to use, locking down the API. You can find it here,

https://github.com/skeeto/emacs-reddit-api

Except for logging in, the library is agnostic about the actual API endpoints themselves. It just knows how to translate between Elisp and the reddit API protocol. This makes the library dead simple to use. I had considered supporting OAuth2 authentication rather than password authentication, but reddit’s OAuth2 support is pretty rough around the edges.

Library Usage

The reddit API has two kinds of endpoints, GET and POST, so there are really only three functions to concern yourself with.

reddit-login
reddit-get
reddit-post

And one variable,

reddit-session

The reddit-login function is really just a special case of reddit-post. It returns a session value (cookie/modhash tuple) that is used by the other two functions for authenticating the user. Just as you get automatically with almost all Elisp data structures — probably more so than any other popular programming language — it can be serialized with the printer and reader, allowing a reddit session to be maintained across Emacs sessions.

The return value of reddit-login generally doesn’t need to be captured. It automatically sets the dynamic variable reddit-session, which is what the other functions access for authentication. This can be bound with let to other session values in order to switch between different users.

Both reddit-get and reddit-post take an endpoint name and a list of key-value pairs in the form of a property list (plist). (The api-type key is automatically supplied.) They each return the JSON response from the server in association list (alist) form. The actual shape of this data matches the response from reddit, which, unfortunately, is inconsistent and unspecified, so writing any sort of program to operate on the API requires lots of trial and error. If the API responded with an error, these functions signal a reddit-error.

Typical usage looks like so. Notice that values need not be only strings; they just need to print to something reasonable.

;; Login first
(reddit-login "your-username" "your-password")

;; Subscribe to a subreddit
(reddit-post "/api/subscribe" '(:sr "t5_2s49f" :action sub))

;; Post a comment
(reddit-post "/api/comment/" '(:text "Hello world." :thing_id "t1_cd3ar7y"))

For plists keys I considered automatically converting between dashes and underscores so that the keywords could have Lisp-style names. But the reddit API is inconsistent, using both, so there’s no correct way to do this.

To further refine the API it might be worth defining a function for each of the reddit endpoints, forming a facade for the wrapper library, hiding way the plist arguments and complicated responses. That would eliminate the trial and error of using the API.

(defun reddit-api-comment (parent comment)
  (if (null reddit-session)
      (error "Not logged in.")
    ;; TODO: reduce the return value into a thing/struct
    (reddit-post "/api/comment/" '(:thing_id parent :text comment))))

Furthermore there could be defstructs for comments, posts, subreddits, etc. so that the “thing” ID stuff is hidden away. This is basically what was already done for sessions out of necessity. I might add these structs and functions someday but I don’t currently have a need for it.

It would be neat to use this API to create an interface to reddit from within Emacs. I imagine it might look like one of the Emacs mail clients, or like Elfeed. Almost everything, including viewing image posts within Emacs, should be possible.

Background

For the last 3.5 years I’ve been a moderator of /r/civ, starting back when it had about 100 subscribers. As of this writing it’s just short of 60k subscribers and we’re now up to 9 moderators.

A few months ago we decided to institute a self-post-only Sunday. All day Sunday, midnight to midnight Eastern time, only self-posts are allowed in the subreddit. One of the other moderators was turning this on and off manually, so I offered to write a bot to do the job. There weren’t any Lisp wrappers yet (though raw4j could be used with Clojure), so I decided to write one.

As mentioned before, the reddit API leaves a lot to be desired. It randomly returns errors, so a correct program needs to be prepared to retry requests after a short delay, depending on the error. My particular annoyance is that the /api/site_admin endpoint requires that most of its keys are supplied, and it’s not documented which ones are required. Even worse, there’s no single endpoint to get all of the required values, the key names between endpoints are inconsistent, and even the values themselves can’t be returned as-is, requiring massaging/fixing before returning them back to the API.

I hope other people find this library useful!

Emacs, Thanksgiving, and Hanukkah

2013-11-28T22:25:36Z

Today is Thanksgiving in the United States. It also happens to be Hanukkah. There’s been news going around that Thanksgiving and Hanukkah will not coincide again for about 80,000 years. This sounded somewhat unbelievable to me because the Gregorian repeats every 400 years. I decided to compute it for myself to double-check this figure.

I’m not Jewish and I know very little about Hanukkah, so I had to look it up. After learning that Hanukkah is based on the Hebrew calendar, the rumors were sounding more believable. The Hebrew calendar repeats every 689,472 Hebrew years. This means the correspondence between Gregorian and Hebrew calendars is about 14 billion years. That 80,000 seems lowball.

Since I decided to use Emacs Lisp for the computation, I fortunately was able to ignore all the unfamiliar, complicated rules for the Hebrew calendar: Emacs knows how to compute Hebrew dates. It can be accessed through the function calendar-hebrew-date-string.

;; Thanksgiving 2013
(calendar-hebrew-date-string '(11 28 2013))
;; => "Kislev 25, 5774"

Hanukkah begins on the 25th of Kislev, so I can write a quick-and-dirty function to detect if a date is the first day of Hanukkah.

(defun hanukkah-p (date)
  "Return non-nil if DATE is Hanukkah."
  (string-match-p "^Kislev 25" (calendar-hebrew-date-string date)))

Next I need a function to compute Thanksgiving, which is really simple. Thanksgiving falls on the fourth Thursday of November.

(defun thanksgiving (year)
  "Return the date of Thanksgiving for YEAR."
  (loop for day from 1 upto 7
        when (= 4 (calendar-day-of-week `(11 ,day ,year)))
        return `(11 ,(+ day 21) ,year)))

If there was no calendar-day-of-week I could compute it using Zeller’s algorithm, which I already happen to have implemented,

(defun cal/day-of-week (year month day)
  "Return day of week number (0-7)."
  (let* ((Y (if (< month 3) (1- year) year))
         (m (1+ (mod (+ month 9) 12)))
         (y (mod Y 100))
         (c (/ Y 100)))
    (mod (+ day (floor (- (* 26 m) 2) 10) y (/ y 4) (/ c 4) (* -2 c)) 7)))

Now for each year find Thanksgiving and test it for Hanukkah. I started with 1942 because that’s when the fourth-Thursday-of-November rule was established. Presumably due to the regexp part, this expression takes a moment to compute.

(loop for year from 1942 to 80000
      when (hanukkah-p (thanksgiving year))
      collect year)
;; => (2013 79043 79290 79537 79564 79635 79784 79811 79882)

My result exactly matches what I’m seeing elsewhere. The rumors are correct! The next coincidence occurs on November 23rd, 79043. Thanks, Emacs!

Elfeed Tips and Tricks

2013-11-26T00:38:20Z

This past weekend I had some questions from next-user-here (NUH) on my original Elfeed post about changing some of Elfeed’s behavior. NUH is an Elisp novice so accomplishing some of the requested modifications wasn’t obvious. A novice is mostly limited to setting variables, not defining advice or using hooks. I’ve also been using Elfeed daily for about three months now as my sole web feed reader and along the way I’ve developed some best practices. In addition to responding to some of NIH’s questions here, I’d like to share some tips and tricks.

Custom Entry Launchers

Currently you can press “b” to launch one or more entries in your browser. You can use “y” to copy an single entry to the clipboard. What if you want to make another action.

In my configuration I have a fancy binding that sends the entry URLs in the selected region to youtube-dl for downloading the videos. It’s too large to share as a snippet so here’s a small example of something similar using a program called xcowsay.

(defun xcowsay (message)
  (call-process "xcowsay" nil nil nil message))

(defun elfeed-xcowsay ()
  (interactive)
  (let ((entry (elfeed-search-selected :single)))
    (xcowsay (elfeed-entry-title entry))))

(define-key elfeed-search-mode-map "x" #'elfeed-xcowsay)

Now when I hit “x” over an entry in Elfeed I’m greeted by a cow announcing the title.

Entry Listing Customization

The search buffer you see when starting Elfeed, where entries are listed, can be customized a few different ways. First, this buffer does grow dynamically. After re-sizing the window/frame horizontally you just have to refresh the view by pressing g (an Emacs convention). How it fills out depends on the settings of these variables,

elfeed-search-title-max-width
elfeed-search-title-min-width
elfeed-search-trailing-width

They control how wide the different columns should be as the window size changes. An important caveat to this is that the cache stored in elfeed-search-cache must be cleared before the changes will be reflected in the display. This cache exists because building the display, assembling all the special faces, is actually quite CPU-intensive. It was an optimization I established early on.

(clrhash elfeed-search-cache)

If you set these variables in your start-up configuration you don’t need to worry about clearing the cache because it will already be empty. It’s only a concern when playing with the settings.

Date Display

Another question was about adding time to the entry listing. Elfeed only displays the entry’s date. Dates are formatted by the function elfeed-search-format-date. This can be redefined to display dates differently.

(defun elfeed-search-format-date (date)
  (format-time-string "%Y-%m-%d %H:%M" (seconds-to-time date)))

It’s given epoch seconds as a float and it returns a string to display as a date.

Faces and Colors

All of the faces used in the display are declared for customization, so these can be changed to whatever you like.

elfeed-search-date-face
elfeed-search-title-face
elfeed-search-feed-face
elfeed-search-tag-face

Say you suffered a head injury and decided you want your Elfeed dates to be bold, purple, and underlined,

(custom-set-faces
 '(elfeed-search-date-face
   ((t :foreground "#f0f"
       :weight extra-bold
       :underline t))))

Database Manipulation

Feeds and entries in the database can be manipulated to become whatever you want them to be. Because Elfeed is regularly modifying the database, the trick is to perform the manipulation at just the right time.

Feed Title Changes

Say you want to change a feed title because you don’t like the title supplied by the feed. For example, the title to my blog’s feed is “null program” but instead you think it should be “Seriously Handsome Programmer” (head injury, remember?). The function elfeed-db-get-feed can be used to fetch a feed’s data structure from the database, given it’s exact URL as listed in your elfeed-feeds.

(let ((feed (elfeed-db-get-feed "https://nullprogram.com/feed/")))
  (setf (elfeed-feed-title feed) "Seriously Handsome Programmer"))

Hold it, that didn’t work. First, that display cache is getting in the way again. Feed titles change very infrequently so they’re cached aggressively. More importantly, next time you update your feeds Elfeed will re-synchronize the feed title with the official title. It’s going to fight against your intervention.

The solution is to do it with a little bit of advice just before the title is displayed. Advise the function elfeed-search-update with some “before” advice.

(defadvice elfeed-search-update (before nullprogram activate)
  (let ((feed (elfeed-db-get-feed "https://nullprogram.com/feed/")))
    (setf (elfeed-feed-title feed) "Seriously Handsome Programmer")))

Entry Tweaking

Automatic entry modification should happen immediately upon discovery so that it looks like the entry arrived that way. This is done through the elfeed-new-entry-hook. Generally this would be used for applying custom tags. These examples are from the documentation:

;; Mark all YouTube entries
(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :feed-url "youtube\\.com"
                              :add '(video youtube)))

;; Entries older than 2 weeks are marked as read
(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :before "2 weeks ago"
                              :remove 'unread))

;; Building subset feeds
(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :feed-url "example\\.com"
                              :entry-title '(not "something interesting")
                              :add 'junk
                              :remove 'unread))

Due to a feature I recently ported from my personal configuration, this tagger helper function is less necessary. You can put lists in your elfeed-feeds list to supply automatic tags.

(setq elfeed-feeds
      '(("https://nullprogram.com/feed/" blog emacs)
        "http://www.50ply.com/atom.xml"  ; no autotagging
        ("http://nedroid.com/feed/" webcomic)))

Content Tweaking

Going beyond tagging you could change the content of the feed. Say you want to make feeds 100 times better.

(defun hundred-times-better (entry)
  (let* ((original (elfeed-deref (elfeed-entry-content entry)))
         (replace (replace-regexp-in-string "keyboard" "leopard" original)))
    (setf (elfeed-entry-content entry) (elfeed-ref replace))))

(add-hook 'elfeed-new-entry-hook #'hundred-times-better)

The same trick could be used to remove advertising, change the date, change the title, etc. The elfeed-deref and elfeed-ref parts are needed to fetch and store content in the content database. Only a reference is stored on the structure. You can actually use these functions at any time outside of Elfeed, but they’ll eventually get garbage collected if Elfeed doesn’t know about them.

(setf ref (elfeed-ref "Hello, World"))
;; => [cl-struct-elfeed-ref "907d14fb3af2b0d4f18c2d46abe8aedce17367bd"]

(elfeed-deref ref)
;; => "Hello, World"

Deletion

A question that’s been asked few times is if entries can be deleted. To start off, the answer to that question is “no.” There is no function provided to remove entries from the database. If you want to remove entries you’re probably taking the wrong approach.

The main problem with removal is that Elfeed needs to keep track of what it’s seen before. If an entry is removed and then rediscovered, it will reappear as unread. There are better ways to “remove” entries, such as tagging them specially.

On a moderately-powerful computer Elfeed can easily handle at least several tens of thousands of database entries. If “too many entries” ever becomes a performance problem I’d rather solve it by making the database faster than by removing information from the database. It’s already very date-oriented so that older entries are infrequently touched.

If storage is a concern, you shouldn’t get too worked up about that. As of this post I have about 6,000 entries in my database and the index file is only 3.5 MB. The content database after garbage collection, which is the data/ directory under ~/.elfeed/, with these 6k entries is 17MB. When I run M-x elfeed-db-compact, currently an experimental feature, it drops down to 1.8MB. That’s less than 1 kB per entry. It’s also less than my personal Liferea database of roughly the same amount of content (~15MB) before I wrote Elfeed.

If even this storage is still too much you can always blow away your data/ content database directory. This is safe to do even while Emacs is running. You’ll still see all of the entries listed in the search buffer but won’t be able to read them within Emacs until after the next database update (when it re-fetches the most recent entry content).

You can also clear out the content database from within Elisp by visiting every entry and clearing its content field.

(with-elfeed-db-visit (entry _)
  (setf (elfeed-entry-content entry) nil))

(elfeed-db-gc)  ;; garbage collect everything

The same sort of expression can be used to run over all known entries to perform other changes. If there was a delete function you might use it here to remove entries older than a certain date, then hope they’re not rediscovered.

If you never want to store entry content (you never read entries within Emacs), you can use a hook to always drop it on the floor as it arrives,

(add-hook 'elfeed-new-entry-hook
          (lambda (entry) (setf (elfeed-entry-content entry) nil)))

Questions?

If you have any questions or suggestions about how to make Elfeed do what you want it to do, feel free to ask. Some things may actually require that I make changes to Elfeed to support it, though I hope I’ve anticipated your particular need well enough to avoid that.

The Elfeed Database

2013-09-09T05:53:41Z

The design of Elfeed’s database took some experimentation before any part of it was settled. A major design constraint was Emacs’ very limited file input/output. There’s no random access and, without the aid of an external program, files must always be read and written wholesale. That’s not database-friendly at all! In the end I settled on a design that minimized the size of the frequently rewritten parts, an index with two different data models, by storing immutable data in a loose-file, content-addressable database.

At the moment there really aren’t any pure-Elisp database solutions for Emacs. This is almost certainly due to the aforementioned I/O limitations. I ran into this same problem last year when I created an Emacs pastebin server. I attempted, and failed, to interface with a SQLite database through it’s command line program. Nic Ferrier has published a generic database interface, but it lacks concrete implementations.

As a bit of good news, as far as I know Emacs does properly handle atomic file updates across all platforms, so a pure-Elisp database developer would never have to worry about only writing half the database. It’s always a safe operation. Worst case scenario you’re left with an old version of data rather than no data at all.

A real possibility for a database would be connecting to an established database server via TCP with an Emacs network process. If the server has a specified wire protocol Elisp could talk to it efficiently. In fact, there’s exists pg.el that does exactly this for PostgreSQL. Unfortunately I was not able to get this working with my pastebin, nor is this solution appropriate for Elfeed. It would be unreasonable to require users to first set up a PostgreSQL server just to read web feeds!

Ultimately it would seem that any efficient Emacs database requires the help of an external program. The notmuch mail client, which inspired Elfeed, does this. To access the notmuch database a command line program is run once for each request. A query is passed as a program argument and the output of the program is parsed into the result.

The Early Database

For the first few days of its existence Elfeed only had an in-memory database. Closing Emacs would lose everything. For my personal usage patterns, where I read, or at least address, all entries that arrive — and especially because I use Elfeed on a couple of different computers — I don’t really need to track things long term. I could easily mark everything after a certain date as read and forget about them. However, it would be nice to have and, more importantly, many people wouldn’t use Elfeed without persistence between Emacs sessions.

So, for the first database I did what I always do: dumped the data structure to a file using the printer and parsed it back in later using the reader. This is dead simple in Lisp, it’s very fast, and it even works for circular data structures. It’s something I missed so much with the much-less-capable JSON format earlier this year that I wrote a JavaScript library to do it.

(defun save-data (file data)
  (with-temp-file file
    (let ((standard-output (current-buffer))
          (print-circle t))  ; Allow circular data
      (prin1 data))))

(defun load-data (file)
  (with-temp-buffer
    (insert-file-contents file)
    (read (current-buffer))))

(save-data "demo.dat" '(a b c ["1" 2 3]))
(load-data "demo.dat")
;; => (a b c ["1" 2 3])

Anything with a printed representation can be serialized and stored this way, including symbols, string, numbers, lists, vectors (structs, objects), hash tables, and even compiled functions (.elc files). Basically every Emacs library that stores data on disk uses this technique.

Unfortunately, this is where I hit another serious database constraint: print-circle is broken in Emacs 24.3, the current stable release. This means Elfeed cannot take advantage of this useful feature, at least not for a long time, as I had been counting on. The final database is slightly slower and larger than strictly required as a result.

The Content Database

After breaking the circular references of the in-memory database I finally had persistence for the first time. With the naive printer/reader approach it was slow, almost 1 second to write just a few thousand entries on my 6-year-old laptop (my minimum requirements target machine). I wanted Elfeed to support hundreds of thousands of entries, if not millions, so this was much too slow.

The big slowdown was writing out all the entry content each time the database is saved. These large strings containing HTML that rarely change. There’s no reason to write these out every time, nor is there a reason to even keep them in memory all the time, as it’s rarely accessed. The solution is a loose-file, content-addressable database, very similar to an unpacked Git object database.

The content database stores immutable sequences of characters — not just raw bytes, but rather multibyte strings — using an unspecified coding system (right now it’s UTF-8 for all platforms). The filename for the content is the content hashed with SHA-1 (“content-addressable”). To limit the number of files per directory, these files are stored in subdirectories named by the first hex-encoded byte of the hash (just like Git). A database of 4 items might look like this:

data/
   18/
      18ff6f11945b1e9f3e3c4cae8b5275d36b9944e1
      184c06a83f0bc73a8345c6d886f9043bcae095f8
   6b/
      6b59ae257f2bea24703d8adf5747049c138dfc82
   cc/
      cc47d53872ae2a9186151ef1a68392a94e1f091f

Something really neat about the content database is that it’s completely agnostic about Elfeed. If it weren’t for Elfeed’s garbage collector, anyone could use it to store arbitrary content. The function elfeed-ref accepts a string and returns a reference into the database. Because of the hash, providing the same string in the future will return the same reference without actually performing a write. References are dereferenced with elfeed-deref.

(setf ref (elfeed-ref "Hello, world!"))
;; => [cl-struct-elfeed-ref "943a702d06f34599aee1f8da8ef9f7296031d699"]

(elfeed-deref ref)
;; => "Hello, world"

With content stored elsewhere, entries are a struct containing only some small metadata: title, link, date, and a content database reference. Writing out many of them at once is much, much faster.

I don’t expect it happens often, but this also means content is de-duplicated. If two entries happen to have the same content they’ll share content database storage. A small savings.

At this point it’s really tempting to get fancier and really put this content database to use. The core index itself could be stored as raw content, and the root to accessing the database would be a single SHA-1 hash referencing it — again, very similar to Git. If an index stores a reference to the previously written index, then the the Elfeed database would be an immutable structure tracking its entire history. Such a change would cost virtually nothing in performance, just disk space.

Multiple Representations

With all the content out of the way, the database is now just a lean index. At this point it’s a hash table mapping feed IDs to feeds. Feeds contain a list of its entries. To build the entry listing for the elfeed-search buffer, Elfeed needs to visit each feed in the hash table, gather its entries into one giant list, then finally sort that list by date. At around O(n log n), that sort operation is a real performance killer. Completely unacceptable. To fix this we need to think about how the data is updated and used.

First, entries are always viewed in date order, no exceptions. From my experience of using web feeds for the last six years I never had a reason to list feed entries by any other order. The vast majority of the time, newer entries are most relevant, and if I need to look for something specific I can search for it.

We definitely want to store entries in date-order so we can create entry listings without performing a sort: something around O(n) or so. Inserting new entries into this structure should also be efficient.

Second, entries are never removed from the database. This isn’t e-mail. Even if a user doesn’t want to see an entry again, we have to keep track of it. Otherwise it will show up as new if it’s discovered in a feed again, which is likely. Things are added to the database and never removed. In Elfeed, I use a junk tag to completely hide entries I don’t want to see, and I always have a -junk element in my filter.

There’s an important caveat to this one that I had missed until after the public release: entry dates can change! When a previously discovered entry is read from a feed, Elfeed updates (read: mutates) the entry struct to reflect the new state. This includes the date. It’s very likely that a date-sorted representation won’t tolerate date changes underneath it since it’s keying off of them. Either we refuse to update the entry date, or we remove the entry, update the date, and then re-insert it (how it currently works).

Third, entries are generally added with a recent date. After the database is initially populated, it’s only picking up new items. We should prefer adding recently-dated entries be faster than adding older entries. I didn’t get a chance to take advantage of this, but it’s something to keep in mind.

Fourth, entries need to be keyed by an ID string. Each entry has a unique, unchanging identifier string, either provided by the feed itself (RSS’s guid or Atom’s id) or generated intelligently by Elfeed. Especially because of the print-circle bug, we need to be able to talk about feeds in terms of their ID — an indirect pointer.

(Actually, even when RSS guid tags are present, they’re permalinks by default. So, unfortunately, RSS IDs are not at all resistant to collisions across feeds. To work around this, entry identifiers are a pair of strings: feed ID and entry ID. Atom doesn’t have this problem, but we’re stuck with the lowest common denominator.)

A date-oriented representation would be unable to efficiently look up an entry by its ID, so it needs to be supplemented by an ID-oriented representation. This means we need two representations in our database: date-oriented and ID-oriented.

So what do we use? Well, for keeping entries sorted by date we want some sort of balanced tree. A B-tree is probably a good choice. Rather than write one I went with an AVL tree since Emacs comes with a library for it (avl-tree). It’s already debugged and optimized! The bad news is that the internal structure is unspecified, so there are no guarantees that it can be serialized. A future update to the library may break the Elfeed database. I also had to hack into it to work around a security issue. The comparison function is embedded in the tree. After deserializing the database, Elfeed needs to ensure that no one stuck a malicious function in there.

The choice for an ID database was super-easy: a hash table. Due to the print-circle bug, this is actually the main representation. The AVL tree only stores IDs and it has to reach into the hash table to do any date comparisons. If print-circle was working I could store the same exact entry objects in the AVL tree as the hash table, so mutating them would update them in all representations. However, with print-circle off, on deserialization these would become unique objects and updates would break.

The Future

That’s where the database is today. I put in a few extra fields that aren’t actually used yet, so that there’s room to make a few changes without breaking the database. Perhaps someday I’ll work out a whole new database structure, or maybe a proper database library will come into existence, and this post will simply document the old database.

Introducing Elfeed, an Emacs Web Feed Reader

2013-09-04T05:33:10Z

Unsatisfied with my the results of recent search for a new web feed reader, I created my own from scratch, called Elfeed. It’s built on top of Emacs and is available for download through MELPA. I intend it to be highly extensible, a power user’s web feed reader. It supports both Atom and RSS.

https://github.com/skeeto/elfeed

The design of Elfeed was inspired by notmuch, which is my e-mail client of choice. I’ve enjoyed the notmuch search interface and the extensibility of the whole system — a side-effect of being written in Emacs Lisp — so much that I wanted a similar interface for my web feed reader.

The search buffer

Unlike many other feed readers, Elfeed is oriented around entries — the Atom term for articles — rather than feeds. It cares less about where entries came from and more about listing relevant entries for reading. This listing is the *elfeed-search* buffer. It looks like this,

This buffer is not necessarily about listing unread or recent entries, it’s a filtered view of all entries in the local Elfeed database. Hence the “search” buffer. Entries are marked with various tags, which play a role in view filtering — the notmuch model. By default, all new entries are tagged unread (customize with elfeed-initial-tags). I’ll cover the filtering syntax shortly.

From the search buffer there are a number of ways to interact with entries. You can select an single entry with the point, or multiple entries at once with a region, and interact with them.

b: visit the selected entries in a browser
y: copy the selected entry URL to the clipboard
r: mark selected entries as read
u: mark selected entries as unread
+: add a specific tag to selected entries
-: remove a specific tag from selected entries
RET: view selected entry in a buffer

(This list can be viewed within Emacs with the standard C-h m.)

The last action uses the Simple HTTP Renderer (shr), now part of Emacs, to render entry content into a buffer for viewing. It will even fetch and display images in the buffer, assuming your Emacs has been built for it. (Note: the GNU-provided Windows build of Emacs doesn’t ship with the necessary libraries.) It looks a lot like reading an e-mail within Emacs,

The standard read-only keys are in action. Space and backspace are for page up/down. The n and p keys switch between the next and previous entries from the search buffer. The idea is that you should be able to hop into the first entry and work your way along reading them within Emacs when possible.

Configuration

Elfeed maintains a database in ~/.elfeed/ (configurable). It will start out empty because you need to tell it what feeds you’d like to follow. List your feeds elfeed-feeds variable. You would do this in your .emacs or other initialization files.

(setq elfeed-feeds
      '("http://www.50ply.com/atom.xml"
        "http://possiblywrong.wordpress.com/feed/"
        ;; ...
        "http://www.devrand.org/feeds/posts/default"))

Once set, hitting G (capitalized) in the search buffer or running elfeed-update will tell Elfeed to fetch each of these feeds and load in their entries. Entries will populate the search buffer as they are discovered (assuming they pass the current filter), where they can be immediately acted upon. Pressing g (lower case) refreshes the search buffer view without fetching any feeds.

Everything fetched will be added to the database for next time you run Emacs. It’s not required at all in order to use Elfeed, but I’ll discuss some of the details of the database format in another post.

Pressing s in the search buffer will allow you to edit the search filter in action.

There are three kinds of ways to filter on entries, in order of efficiency: by age, by tag, and by regular expression. For an entry to be shown, it must pass each of the space-delimited components of the filter.

Ages are described by plain language relative time, starting with @. This component is ultimately parsed by Emacs’ time-duration function. Here are some examples.

@1-year-old
@5-days-ago
@2-weeks

Tag filters start with + and -. When +, entries must be tagged with that tag. When -, entries must not be tagged with that tag. Some examples,

+unread: show only unread posts.
-junk +unread: don’t show unread “junk” entries.

Anything else is treated like a regular expression. However, the regular expression is applied only to titles and URLs for both entries and feeds. It’s not currently possible to filter on entry content, and I’ve found that I never want to do this anyway.

Putting it all together, here are some examples.

linu[xs] @1-year-old: only show entries about Linux or Linus from the last year.
-unread +youtube: only show previously-read entries tagged with youtube.

Note: the database is date-oriented, so age filtering is by far the fastest. Including an age limit will greatly increase the performance of the search buffer, so I recommend adding it to the default filter (elfeed-search-search-filter).

Tagging

Generally you don’t want to spend time tagging entries. Fortunately this step can easily be automated using elfeed-make-tagger. To tag all YouTube entries with youtube and video,

(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :feed-url "youtube\\.com"
                              :add '(video youtube)))

Any functions added to elfeed-new-entry-hook are called with the new entry as its argument. The elfeed-make-tagger function returns a function that applies tags to entries matching specific criteria.

This tagger tags old entries as read. It’s handy for initializing an Elfeed database on a new computer, since I’ve likely already read most of the entries being discovered.

(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :before "2 weeks ago"
                              :remove 'unread))

Creating custom subfeeds

Tagging is also really handy for fixing some kinds of broken feeds or otherwise filtering out unwanted content. I like to use a junk tag to indicate uninteresting entries.

(add-hook 'elfeed-new-entry-hook
          (elfeed-make-tagger :feed-url "example\\.com"
                              :entry-title '(not "something interesting")
                              :add 'junk
                              :remove 'unread))

There are a few feeds I’d like to follow but do not because the entries lack dates. This makes them difficult to follow without a shared, persistent database. I’ve contacted the authors of these feeds to try to get them fixed but have not gotten any responses. I haven’t quite figured out how to do it yet, but I will eventually create a function for elfeed-new-entry-hook that adds reasonable dates to these feeds.

Custom actions

In my own .emacs.d configuration I’ve added a new entry action to Elfeed: video downloads with youtube-dl. When I hit d on a YouTube entry either in the entry “show” buffer or the search buffer, Elfeed will download that video into my local drive. I consume quite a few YouTube videos on a regular basis (I’m a “cord-never”), so this has already saved me a lot of time.

Adding custom actions like this to Elfeed is exactly the extensibility I’m interested in supporting. I want this to be easy. After just a week of usage I’ve already customized Elfeed a lot for myself — very specific customizations which are not included with Elfeed.

Web interface

Elfeed also includes a web interface! If you’ve loaded/installed elfeed-web, start it with elfeed-web-start and visit this URL in your browser (check your httpd-port).

http://localhost:8080/elfeed/

Elfeed exposes a RESTful JSON API, consumable by any application. The web interface builds on this using AngularJS, behaving as a single-page application. It includes a filter search box that filters out entries as you type. I think it’s pretty slick, though still a bit rough.

It still needs some work to truly be useful. I’m intending for this to become the “mobile” interface to Elfeed, for remote access on a phone or tablet. Patches welcome.

Try it out

After Google Reader closed I tried The Old Reader for awhile. When that collapsed under its own popularity I decided to go with a local client reader. Canto was crushed under the weight of all my feeds, so I ended up using Liferea for awhile. Frustrated at Liferea’s lack of extensibility and text-file configuration, I ended up writing Elfeed.

Elfeed now serving 100% of my personal web feed reader needs. I think it’s already far better than any reader I’ve used before. Another case of “I should have done this years ago,” though I think I lacked the expertise to pull it off well until fairly recently.

At the moment I believe Elfeed is already the most extensible and powerful web feed reader in the world.

Leaving Gmail Behind

2013-09-03T03:45:58Z

Update May 2017: Shortly after switching to modal editing, I stopped using Emacs and Notmuch as my mail client. I now use Mutt and Vim.

For the last 8 years I have been using Gmail as my e-mail provider, during which 3 years I was a student. It was very convenient, inexpensive (cost-free!), and especially suitable for a student using various lab computers and stuck behind a fairly restrictive firewall (an anti-file-sharing measure). I didn’t have to worry about filtering spam or maintaining or paying for a server. Easy, easy, easy. This has finally come to an end.

That convenience kept me as a user after college up until now. As of two weeks ago I changed my e-mail address. You can find it listed below my portrait on this page (if you’re reading this on my website). Not only am I using my own domain name, but I’m running my own e-mail server. After attempting, and failing, at creating a decent setup with mu4e, I ended up following this guide:

A Hacker’s Replacement for GMail

It’s built around the superb notmuch mail indexer, which includes a powerful, fast Emacs e-mail client. Just from its technical superiority I’m wishing I switched to notmuch years ago. I’m now understanding for the first time why all those old fogey hackers like to use e-mail for everything: mailing lists, software patches, bug reporting, etc. E-mail a user interface agnostic system, giving everyone their own choice. The tricky part is setting up a decent interface to it.

Prompting the change

As you know, Google Reader shut down this past July. This left me using only two Google services: Talk and Mail. Not only is Talk easily replaceable — it has no significant data on Google’s servers — it will be shut down soon as well. I would need to move to a different XMPP server anyway. If I could move off of Gmail, I would finally be able to discontinue my Google account for good!

Why would I want to do this? It’s become increasingly apparent, especially this year, that there is very little privacy to be had when logged into a Google account. Various intelligence and law enforcement organizations have easy — likely automated — access to user data, especially e-mail. I’d really like to take that privacy back.

There are also the technical reasons. It’s a bit embarrassing to admit, but the final straw that pushed me to finally leaving Gmail was the new compose interface — a technical issue rather than a privacy issue — which was pushed out just a few days before I left. I found it to be very unpleasant and, worse, completely incompatible with Pentadactyl, as there is no longer a plain-text option (the one listed is a fake). A huge technical step backwards, towards the layman and away from the power user. I could do so much better than this.

Also embarrassing was being unable to have any meaningful use of PGP with e-mail all these years. That’s something I’ve always wanted to fix.

Brian was a guinea pig for me, because, between the two of us, he was actually the first one to move to his own e-mail platform. Seeing his success was a big encouragement for me. Not only could it be done fairly easily, but the results would be a huge improvement. This was all within my grasp!

A daunting task

Switching everything about my e-mail — provider, client, spam filter, server, domain — and running it myself would be a daunting task. There’s 8 years of archived e-mail to manage, though I kept it trimmed to a relatively light 1 GB of storage (which I cut down to 200 MB before exporting). I’ve made backups of all this e-mail on occasion, but I never had to worry about searching or actually using it. The backups were done “just in case.”

Gmail has excellent spam filtering, an advantage of having so many samples available, and I’m a complete newbie at dealing with it myself. Despite having my e-mail address published on this blog for the last 6 years, I surprisingly only receive about one spam message per day, so this wasn’t actually a huge risk for me.

Then there’s the issue of not looking like spam myself. My e-mail server needs to sit in a friendly IP neighborhood. I need to have a proper PTR record (reverse DNS). I need to generally look legitimate. No showing up to deliver mail to other mail servers in just a t-shirt. This is actually something I’m still struggling with right now. In fact, if I’ve sent you personal e-mail in the past and you’re using Gmail, you should check your spam folder right now, because something I’ve sent recently may likely have been caught in it. I don’t know why yet.

I would also need to learn the ropes of a new e-mail client. I used Eudora from 1995 to 2005, then Gmail up until now. Now, I’m the last person to be reluctant about learning how to operate a new piece of software. I’m constantly on the lookout for better software. The problem is that I use e-mail for a lot of very important things. I can’t afford mistakes. I need to hit the ground running on this one.

notmuch vs mu4e

Since I decided early on to go with an Emacs-based e-mail client, learning a new e-mail client got a lot easier. I know Emacs Lisp pretty darn well. In the worst case of getting stuck, I could very easily study the client’s source code and work out for myself whatever is going on. I could even monkey patch it in my configuration if it was causing problems for me (and I’ve already done exactly that).

I wanted to use the Maildir format, something I could hack on if needed. The two obvious choices for this were mu4e and notmuch, both started in 2009. I initially reached for mu4e. Compared to notmuch, it follows Emacs idioms more closely. For example, the e-mail listing is oriented around a mark and execute paradigm, like a dired buffer. After an initial glance, it felt more integrated.

Unfortunately, mu4e is still not mature enough for real productive use, making it far too risky for e-mail. I found out the hard way that the database format has varied regularly between versions. Worse, mu4e is not suitable for remote access. Not only does it assume the Maildir directory is on the local host, it uses absolute paths to access it, so it won’t work over sshfs as I had hoped. Bummer.

In contrast, the notmuch client is specifically designed to be operated remotely. Emacs doesn’t realize that it runs the notmuch client over SSH. Emacs doesn’t need to touch the Maildir directly. It’s a beautiful setup, one very friendly to versioning dotfiles. I’ve already done some pretty heavy configuration to get it exactly the way I want it. On top of this, notmuch is incredibly fast and stable. It’s been a very enjoyable client. So much so that it inspired me to build a web feed reader with a similar interface (to be described in my next post).

The mail server

My early plan was to run an e-mail server on a Raspberry Pi. It’s low-powered, making it very inexpensive and quiet to operate. It’s also very portable, so I wouldn’t need to lug a server around if I needed to move it — no more difficult than moving a cell phone charger. I could run it from my nightstand next to my bed if I wanted to. On the other hand, I would be a little nervous running my mail server on a residential connection. The downtime would be a product of my ISP, my power company, and my router. It’s not bad, but it’s risky when I’m worried about receiving important e-mail. If I was in the middle of job hunting I probably wouldn’t attempt it at all. Fortunately e-mail servers will retry over the course of several days, so I think this would generally be manageable.

This plan was struck down by Comcast’s network policies. The good news is that my IP address has not changed in years. The bad news is that they block port 25 both incoming and outgoing as an anti-spam measure. This makes it impossible to run an e-mail server, because e-mail must be received on port 25 (MX records were misdesigned feature). For outgoing, I would need to send e-mail through Comcast’s smarthost server, which brings up the privacy issue again. I assume the same organizations have this tapped as well. Even if port 25 wasn’t blocked, I wouldn’t be able to set a PTR record and my IP neighborhood would be suspicious.

I ended up going with Digital Ocean, as the linked guide suggested. The smallest, cheapest offering is more than suitable for my needs both as an e-mail server and an XMPP server. It will probably be handy for other short-lived, experimental servers too.

I used getmail to get all of my old mail onto the mail server in the Maildir format. It was completely straightforward and probably the easiest part of it all.

PGP

I can finally use GnuPG with my e-mail. An important factor in this setup is that encryption and decryption is done locally, not on my e-mail server. I don’t need to trust the server with my private keys. However, verification of signatures is done on the server, which is slightly less than ideal, but manageable.

I thought I might need to generate a fresh PGP key for this new e-mail address, but instead I learned something new about PGP. A key can have multiple identities attached to it, so all I needed to do was edit the key and add the new e-mail address. The PGP designers had already thought of this problem two decades ago! The updated key is linked next to my portrait, as well as distributed on the public keyservers.

I look forward to making more use of cryptography with my e-mail.

The old address

It will take me awhile, a year or more, to move everything off of my old e-mail address. I have a lot of accounts associated with it, many of which won’t allow me to change my e-mail address. So, in the meantime, anything sent there will continue to be forwarded to me. That’s now the only purpose of my Google account.

I’m a little worried about using my new e-mail address as my Digital Ocean account because it presents a circularity problem. I could easily get myself locked out of everything. I’ll have to figure out how I’m going to handle that situation. (Use my work e-mail address? So long as I don’t get locked out of my e-mail and get fired all at the same time.)

If you’re a hacker, I encourage you to run your own e-mail server if you’re not doing so already. It’s been extremely liberating for me.

Personal OS Configuration Live System

2013-06-17T00:00:00Z

I don’t know what to title or name this thing, so bear with me.

Two years ago I started versioning my Emacs configuration. One year ago I started versioning the rest of my dotfiles the same way. This has composed beautifully with Debian, which truly is the universal operating system. To create my comfortable development environment from scratch on a new (or used) computer, all I need to do is install a bare-bones Debian system, then direct it to automatically install a short list of my preferred software (apt-get), and finally clone these two repositories into place. Given a decent Internet connection, the whole process takes under an hour to go from blank hard drive to highly-productive computer system.

In fact, this whole process is so straightforward that it can be automated using an amazing tool called live-build! Taking the next step in versioning and automation, I wrote a live-build build that creates a live system with my personal configuration baked in.

https://github.com/skeeto/live-dev-env

A link to the latest ISO build can be found in the above link. To try it out, burn it to a CD, write it onto a flash drive (it’s a hybrid ISO), or just fire it up in your favorite virtual machine. You will be booted directly into very nearly my exact configuration, down to the same random wallpaper selection. It’s extremely minimal and will look like this.

I don’t know for sure yet if this will be useful to anyone except me. On occasions where I need to make quick use of some arbitrary computer, or maybe just for system rescue, having this will be incredibly handy. Knoppix is nice, but working without my own configuration can be discouraging; it’s so slow in comparison.

It’s also a chance for others to glimpse at my workflow without any commitment (i.e. potential dotfile clobbering). I like to study other developers’ workflows, stealing their ideas for my own, so I want to make mine easy to study. I think I’m doing some innovative things with a hacked-together pseudo-tiling window manager, my Firefox configuration, and my Emacs configuration. At least two of my co-workers’ Emacs configurations are forked from mine (you know who you are!). From a selfish perspective, the more people using workflows like mine, the better these workflows will be supported by the community at large!

I had said that it’s “very nearly” my exact configuration. At the time of this writing, what’s missing is Quicklisp and Leiningen, since these aren’t available as fully-functioning Debian packages at the moment. I’ll work them in eventually. The build is non-incremental and takes about an hour right now, so adding these little extras by trial-and-error will take some time.

Pentadactyl really shines here, because it allows me to completely configure Firefox/Iceweasel from a version-friendly text file. Except for some user scripts (still figuring out how to install those at build time), the browser in that image is identical to what I’m using right now. Even though V8 is the king of performance, Firefox still wins hands-down for power users due to its superior configurability.

However, this Firefox configuration still has an annoyance. Several of the browser add-ons that I pre-install always pop up their first-run welcome messages. These messages flip a setting in Firefox’s registry so they don’t run again, a setting that is lost when the system shuts down. I can toggle these settings from the Pentadactyl configuration file, but not early enough. By the time Pentadactyl gets to applying these settings it’s too late. The messages, including Pentadactyl’s, are already queued to be shown. I don’t think it’s possible to completely fix this, even if Pentadactyl is fixed.

Anything not already installed is readily installable through apt-get. Right now I’m considering the feasibility of some sort of lazy-install system based on command-not-found. Debian has a package called command-not-found which intercepts the shell’s error handler when an issued command is not found. Instead of just giving the normal warning, it prints out what package needs to be installed in order to provide the command. What I think would be really neat is if the needed package is automatically installed, then the requested command is then re-run, all without returning control to the shell in the interim. It would be a lot like Emacs autoloads. As long as I have an Internet connection, most of Debian’s packages would be virtually installed on my live system as far as I’m concerned. The initial run of any program just takes a little longer.

I’ll continue to tweak this image over time, not only as I figure out how to make things work in Debian live-build, but also as my preferences and workflows evolve. Adjusting my configuration to work on a live-system has been enlightening, revealing all sorts of little manual things that I hadn’t yet automated. Perhaps someday this build will replace traditional operating system installations for me, at least for productive work. I could do all of my work from a portable read-only live system with a bit of short-lived (i.e. local cache) user-data persistence stored on a separate writable medium.

Emacs Mouse Slider Mode for Numbers

2013-06-07T00:00:00Z

One of my regular commenters, and as of recently co-worker, Ahmed Fasih, sent me a video, Live coding in Lua. The author of the video added support to his IDE for scaling numbers in source code by dragging over them with the mouse. This feature was directly inspired by Bret Victor, a user interface visionary, probably best introduced through his presentation Inventing on Principle.

I think Bret’s interface ideas are interesting and his demos very impressive. However, I feel they’re too specialized to generally be very useful. Skewer suffers from the same problem: in order to truly be useful, programs need to be written in a form that expose themselves well enough for Skewer to manipulate at run-time. Some styles of programming are simply better suited to live development than others. This problem is amplified in Bret’s case by the extreme specialty of the tools. They’re fun to play with, and probably great for education, but I can’t imagine any time I would find them useful while being productive.

Anyway, Ahmed wanted to know if it would be possible to implement this feature in Emacs. I said yes, knowing that Emacs ships with artist-mode, where the mouse can be used to draw with characters in an editing buffer. That’s proof that Emacs has the necessary mouse events to do the job. After spending a couple of hours on the problem I was able to create a working prototype: mouse-slider-mode.

https://github.com/skeeto/mouse-slider-mode

Demo video requires HTML5 with WebM support.

It’s a bit rough around the edges, but it works. When this minor mode is enabled, right-clicking and dragging left or right on any number will decrease or increase that number’s value. More so, if the current major mode has an entry in the mouse-slider-mode-eval-funcs alist, as the value is scaled the expression around it is automatically evaluated in the live environment. The documentation shows how to enable this in js2-mode buffers using skewer-mode. This is actually a step up from the other, non-Emacs implementations of this mouse slider feature. If I understood correctly, the other implementations re-evaluate the entire buffer on each update. My version only needs to evaluate the surrounding expression, so the manipulated code doesn’t need to be so isolated.

There is one limitation that cannot be fixed using Elisp. If the mouse exits the Emacs window, Elisp stops receiving valid mouse events. Number scaling is limited by the width of the Emacs window. Fixing this would require patching Emacs itself.

This is purely a proof-of-concept. It’s not installed in my Emacs configuration and I probably won’t ever use it myself, except to show it off as a flashy demo with an HTML5 canvas. If anyone out there finds it useful, or thinks it could be better, go ahead and adopt it.

A Handy Emacs Package Configuration Macro

2013-06-02T00:00:00Z

Update April 2015: I now use use-package instead of the with-package macro explained below. It’s cleaner, nicer, and better maintained.

I was inspired by a post recently written by Milkypostman (the M in MELPA). He describes some of his init.el configuration, specifically focusing on an after macro that wraps the misdesigned eval-after-load function. I wanted to take this macro further in three ways:

The delayed expression should be properly byte-compiled, which doesn’t happen by default with eval-after-load.
In a few cases my expression depends on multiple, independent packages but eval-after-load only accepts one.
If I’m specifying packages when using my macro, why bother listing them at the top of my initialize file? I could DRY things up by learning what packages to install when the macro is used. Here’s the kicker: I can pretend that every available package is already installed like built-in packages!

The result is a pair of macros with-package and with-package* which can be found in package-helper.el. The latter form doesn’t wait but immediately loads the specified packages with require. It’s shaped just like Milkypostman’s after macro, except that it can accept a list of packages in place of a single symbol. Also, the package names aren’t quoted; they don’t need to be since this is a macro instead of a function.

Here’s a typical use case for each macro. That expose higher-order function is from my personal utility library. The expressions to be evaluated depend on both packages and neither needs to be loaded immediately, so I’m using the first form of the macro.

(with-package (skewer-mode utility)
  (skewer-setup)
  (define-key skewer-mode-map (kbd "C-c $")
    (expose #'skewer-bower-load "jquery" "1.9.1")))

(with-package* smex
  (smex-initialize)
  (global-set-key (kbd "M-x") 'smex))

For the second one, I’m going to be using smex right away (takes over M-x), so I use the second form, which immediately loads smex. The macro isn’t really necessary at all here since I could just use require and follow it with these expressions, but I really like how this organizes my init.el. It creates a domain-specific language (DSL) just for Emacs configuration. Each package configuration is grouped up in a clean let-like form. Since I’ve added syntax highlighting to with-package it looks very elegant. Normal syntax highlighters aren’t going to do this, so here’s a screenshot of my buffer.

JavaScript developers with a keen eye may notice a familiar pattern here. This macro is shaped a bit like the Asynchronous Module Definition (AMD), with asynchronousy in mind. Since this is Lisp with a powerful macro system, I get to hide away the function wrapper part.

Using this macro has caused me to use eval-after-load with just about everything. This has cut my initialization time down to about 10% of what it was before! On those occasions that I do restart Emacs, it’s really nice that it’s back to under 1 second (0.6 seconds vs 6 seconds).

The problem of eval-after-load

I’m calling eval-after-load poorly designed because it’s a perfect example of an inappropriate use of eval. In function form it should have accepted a function as its second argument instead of an s-expression, so it would work like a hook. This is even more inappropriate now that Emacs has proper lexical closures, which is the perfect mechanism for delayed evaluation. The whole point of eval-after-load is to speed up Emacs initialization time, but using eval is slow. To the compiler, this isn’t code, just data. This means no byte-compilation and no compiler warnings.

A possible alternative design for eval-after-load would be a hook named something like -load-hook. Then when load or require loads a file, it runs the hook with the matching name. This removes eval-after-load as its own standalone language concept.

(add-hook 'skewer-mode-load-hook (lambda () ...))

The problem here is when the package is already loaded the hook is never run. In contrast, when eval-after-load is used on an already-loaded package, the expression is immediately evaluated.

Given this, if there was something I could change about this it would simply be for eval-after-load, whatever it would be called, to take a function for the second argument. I would also provide a simple macro just like after that wraps this function. Why not just a macro? The function form would be really useful for a situation like this,

(eval-after-load 'skewer-mode #'skewer-setup)

Here there’s no need to instantiate a new anonymous function or s-expression. If all it’s doing is calling a zero-arity function, that function can be passed in directly.

Skewer Gets HTML Interaction

2013-06-01T00:00:00Z

A month ago Zane Ashby made a pull request that added another minor mode to Skewer: skewer-html-mode. It’s analogous to the skewer-css minor mode in that it evaluates HTML “expressions” in the context of the current page. The original pull request was mostly a proof of concept, with evaluated HTML snippets being appended to the end of the page (body) unless a target selector is manually specified.

This mode is still a bit rough around this edges, but since I think it’s useful enough for productive work I’ve merged it in.

Replacing HTML

Unsatisfied with just appending content, I ran with the idea and updated it to automatically replace structurally-matching content on the page when possible. Zane’s fundamental idea remained intact: a CSS selector is sent to the browser along with the HTML. Skewer running in the browser uses querySelector() to find the relevant part of the document and replaces it with the provided HTML. This is done with the command skewer-html-eval-tag (default: C-M-x), which selects the innermost tag enclosing the point.

To accomplish this, an important piece of skewer-html exists to compute this CSS selector. It’s a purely structural selector, ignoring classes, IDs, and so on, instead relying on the pseudo-selector :nth-of-type. For example, say this is the content of the buffer and the point is somewhere inside the second heading (Bar).


  
  
     id="main">
      Foo
      I am foo.
      Bar
      I am bar.

The function skewer-html-compute-selector will generate this selector. Note that :nth-of-type is 1-indexed.

body:nth-of-type(1) > div:nth-of-type(1) > h1:nth-of-type(2)

The > syntax requires that these all be direct descendants and :nth-of-type allows it to ignore all those paragraph elements. This means other types of elements can be added around these headers, like additional paragraphs, without changing the selector. The :nth-of-type on body is obviously unnecessary, but this is just to keep skewer-html dead simple. It doesn’t need to know the semantics of HTML, just the surface syntax. There will only ever be one body tag, but to skewer-html it’s just another HTML tag.

Side note: this is why I strongly prefer to use /> self-closing syntax in HTML5 even though it’s unnecessary. Unlike XML, that closing slash is treated as whitespace and it’s impossible to self-close tags. The schema specifies which tags are “void” (always self-closing: img, br) and which tags are “normal” (explicitly closed: script, canvas). This means if you don’t use /> syntax, your editor would need to know the HTML5 schema in order to properly understand the syntax. I prefer not to require this of a text editor — or anything else doing dumb manipulations of HTML text — especially with the HTML5 specification constantly changing.

When I was writing this I originally included html in the selector. Selector computation would just walk up to the root of the document regardless of what the tags were. Curiously, including this causes the selector to fail to match even though this is literally the page structure. So, out of necessity, skewer-html knows enough to leave it off.

For replacement, rather than a simple innerHTML assignment on the selected element, Skewer is parsing the HTML into an node object, removing the selected node object, and putting the new one in its place. The reason for this is that I want to include all of the replacement element’s attributes.

Another HTML oddity is that the body and head elements cannot be replaced. It’s a limitation of the DOM. This means these tags cannot be “evaluated” directly, only their descendants. Brian and I also ran into this issue in impatient-mode while trying to work around a strange HTML encoding corner case: scripts loaded with a script tag created by document.write() are parsed with a different encoding than when loaded directly by adding a script element to the page.

This last part is actually a small saving grace for skewer-css, which works by appending new stylesheets to the end of body. Why body and not head? Because some documents out there have stylesheets linked from body, and properly overriding these requires appending stylesheets after them. If body is replaced by skewer-html, all of the dynamic stylesheets appended by skewer-css would be lost, reverting the style of the page. Since we can’t do that, this isn’t an issue!

Appending HTML

So what happens when the selector doesn’t match anything in the current document? Skewer fills in the missing part of the structure and sticks the content in the right place. Next time the tag is evaluated, the structure exists and it becomes a replacement operation. This means the document in the browser can start completely empty (like the run-skewer page) and you can fill in content as you write it.

But what if the page already has content? There’s an interactive command skewer-html-fetch-selector-into-buffer. You select a part of the page and it gets inserted into the current buffer (probably a scratch buffer). The idea is that you can then modify and then evaluate it to update the page. This is the roughest part of skewer-html right now since I’m still figuring out a good workflow around it.

If you have Skewer installed and updated, you already have skewer-html. It was merged into master about a month ago. If you have any ideas or opinions for how you think this minor mode should work, please share it. The intended workflow is still not a fully-formed idea.

Should Emacs Packages Self Configure?

2013-05-23T00:00:00Z

Update 2013-06-01: I ultimately decided that Skewer should not modify any mode hooks automatically. Instead the major mode hooks can be configured by putting (skewer-setup) in your initialization file. This function is designed to play well with autoloading, so using it won’t increase your startup time.

There’s a discussion happening on a Skewer issue on GitHub: Problems with skewer-css autoload. The issue was opened by Steve Purcell. Right now Skewer’s CSS minor mode is enabled by default in less-css-mode, which is, of course, incompatible with the minor mode. It’s enabled because less-css-mode is derived from css-mode, so it runs all of css-mode’s hooks.

There are actually two separate problems here.

The hook to activate the minor mode needs to check what major mode is activating it, because css-mode-hook is run by other modes. This is easy to fix, though it’s not very elegant. I would need to do the same for skewer-html-mode.
Skewer is eagerly configuring itself once it’s installed. This is intentional: I want Skewer to be really easy to use right out-of-the-box. There’s no “install the package, then add these lines to your startup configuration.” If I remove this behavior, the previous problem becomes the user’s problem, since it’s up to them to activate the minor mode.

Steve is telling me that this auto-configuration is a bad idea; it so often causes these sorts of messes. The gist of his argument is that installing a package is separate from enabling a package. It’s up to the user to decide how and when they want to use a package. Steve maintains a number of popular Emacs packages and, even more importantly, he’s one of the MELPA maintainers. He knows this stuff much better than I do. He even used to share my current opinion up until two months ago when someone changed his mind.

On the other hand, I really dislike when software has such awful defaults that it’s unusable without first configuring it. Skewer’s case wouldn’t be too bad since it can be enabled manually without editing any configuration, but practical use would require that users configure Skewer in their startup files. It would also make it harder for users to discover Skewer’s features. They might not otherwise even be aware there are CSS and HTML minor modes! If the concern is separating installation and activation, package.el does have a variable, package-load-list, for this purpose, though using it isn’t very convenient.

Right now I’m stuck in this dilemma like a deer caught in the headlights. Formal packages are a very new thing for Emacs, so there doesn’t seem to be a community consensus on this issue yet. I really like when the packages I install behave like Skewer does now (i.e. nrepl.el) so I don’t need to configure them. But I would also be easily frustrated if that configuration magic was getting in my way. This particular annoyance happens to me outside of Emacs often enough (i.e. Chromium), though it’s worse because it generally can’t easily be fixed like in Emacs.

I’m still trying to make up my mind about this. If you have an opinion on the matter I’d like to hear it. You can leave a comment here, or much better, leave your comment on the issue on GitHub. It’s not going to come down to a vote or anything like that. I just want to get a feel for how people expect Emacs packages to work.

Load Libraries in Skewer with Bower

2013-05-18T00:00:00Z

I recently added support to Skewer for loading libraries on the fly using Bower’s package infrastructure. Just make sure you’re up to date, then while skewering a page run M-x skewer-bower-load. It will prompt for a package and version, download the library, then inject it into the currently skewered page.

Because the Bower infrastructure is so simple, Bower is not actually needed in order to use this. Only Git is required, configured by skewer-bower-git-executable, which it tries to configure itself from Magit if it’s been loaded.

Motivation

Skewer comes with a userscript that adds a small toggle button to the top-right corner every page I visit. Here’s a screenshot of the toggle on this page.

When that little red triangle is clicked, the page is connected to Emacs and the triangle turns green. Click it again and it disconnects, turning red. It remembers its state between page refreshes so that I’m not constantly having to toggle.

It’s mainly for development purposes, but it’s occasionally useful to Skewer an arbitrary page on the Internet so that I can poke at it from Emacs. One habit that I noticed comes up a lot is that I want to use jQuery as I fiddle with the page, but jQuery isn’t actually loaded for this page. What I’ll do is visit a jQuery script in Emacs and load this buffer (C-c C-k). As expected, this is tedious and easily automated.

Rather than add specific support for jQuery, I thought it would be more useful to hook into one of the existing JavaScript package managers. Not only would I get jQuery but I’d be able to load anything else provided by the package manager. This means if I learn about a cool new library, chances are I could just switch to my *javascript* scratch buffer, load the library with this new Skewer feature, and play with it. Very convenient.

How it Works

There are a number of package managers out there. I chose Bower because of its emphasis on client-side JavaScript and, more so, because its infrastructure is so simple that I wouldn’t actually need to use Bower itself to access it. In adding this feature to Skewer, I wrote half a Bower client from scratch very easily.

The only part of the Bower infrastructure hosted by Bower itself is a tiny registry that maps package names to Git repositories. This host also accepts new mappings, unauthenticated, for registering new packages. The entire database is served up as plain old JSON.

https://bower.herokuapp.com/packages

To find out what versions are available, clone this repository with Git and inspect the repository tags. Tags that follow the Semantic Versioning scheme are versions of the package available for use with Bower. Once a version is specified, look at bower.json in the tree-ish referenced by that tag to get the rest of the package metadata, such as dependencies, endpoint listing, and description.

This is all very clever. The Bower registry doesn’t have to host any code, so it remains simple and small. It could probably be rewritten from scratch in 15-30 minutes. Almost all the repositories are on GitHub, which most package developers are already comfortable with. Package maintainers don’t need to use any tools or interact with any new host systems. Except for adding some metadata they just keep doing what they’re doing. I think this last point is a big part of MELPA’s success.

Bower’s Fatal Weaknesses

Unfortunately Bower has two issues, one of which is widespread, that seriously impacts its usefulness.

Dependency Specification

Even though Bower specifies Semantic Versioning for package versions, which very precisely describes version syntax and semantics, the dependencies field in bower.json is underspecified. There’s no agreed upon method for specifying relative dependency versions.

Say your package depends on jQuery and it relies on the newer jQuery 1.6 behavior of attr(). You would mark down that you depend on jQuery 1.6.0. Say a user of your package is also using another package that depends on jQuery, it’s using the on() method, which requires jQuery 1.7 or newer. It specifies jQuery 1.7.0. This is a dependency conflict.

Of course your package works perfectly fine with 1.7.0. It works fine at 1.6.0 and later. In other package management systems, you would probably have marked that you depend on “>=1.6.0” rather than just 1.6.0. Unfortunately, Bower doesn’t specify this as a valid dependency version. Some package maintainers have gone ahead and specified relative versions anyway, but inconsistently. Some use the “>=” prefix like I did above, some prefix with “~” (“about this version”), which is pretty useless.

And this leads into the other flaw.

Most Bower Packages are Broken

While some parts of Bower are underspecified, most packages don’t follow the simple specifications that already exist! That is to say, most Bower packages are broken. This is incredibly unfortunate because it means at least half of the packages can’t be loaded by this new Skewer feature.

How are they broken? As of this writing, there are 2,195 packages list in Bower’s registry.

113 (5%) of them have unreachable or unresponsive repositories. About half of these are due to invalid repository URLs.
1,830 (83%) have no bower.json metadata file. This means the client has to guess at the metadata.
1,034 (47%) have unguessable endpoints. My client looks for other package management metadata outside of Bower’s, as well as tries to guess base on the package name. Failing to guess causes the package to fail to load. These packages aren’t a subset of the last set with missing bower.json files. Sometimes the bower.json files contain incorrect information, which causes my client to drop into guessing mode.
1400 (64%) don’t use Semantic Versioning: either no versioning at all or some other arbitrary versioning system.
In total, 2041 (93%) of all Bower packages have invalid or missing metadata — bad registry entry, missing bower.json file, or lack of semantic version tags.

The good news is that most of the important libraries, like jQuery and Underscore, work properly. I’ve also registered two of my JavaScript libraries, ResurrectJS and rng-js, so these can be loaded on the fly in Skewer.

Tracking Mobile Device Orientation with Emacs

2013-04-27T00:00:00Z

Nine years ago I bought my first laptop computer. For the first time I could carry my computer around and do productive things at places beyond my desk. In the meantime a new paradigm of mobile computing has arrived. Following a similar pattern, this month I bought a Samsung Galaxy Note 10.1, an Android tablet computer. Having never owned a smartphone, this is my first taste of modern mobile computing.

Once the technology caught up, laptops were capable enough to fully replace desktops. However, this tablet is no replacement for my laptop. Mobile devices are purely for consumption, so I will continue to use desktops and laptops for the majority of my computing. I’m writing this post on my laptop, not my tablet, for example.

Owning a tablet has opened up a whole new platform for me to explore as a programmer. I’m not particularly interested in writing Android apps, though. I’m obviously not alone in this, as I’ve found that nearly all Android software available right now is somewhere between poor and mediocre in quality. The hardware was worth the cost of the device, but the software still has a long way to go. I’m optimistic about this so I have no regrets.

A New Web Platform

Instead, I’m interested in mobile devices as a web platform. One of the few high-quality pieces of software on Android are the web browsers (Chrome and Firefox), and I’m already familiar with developing for these. Even more, I can develop software live on the tablet remotely from my laptop using Skewer — i.e. the exact same development tools and workflow I’m already using.

What’s new and challenging is the user interface. Instead of traditional clicking and typing, mobile users tap, hold, swipe, and even tilt the screen. Most challenging of all is probably accommodating both kinds of interfaces at once.

One of the first things I wanted to play with after buying the tablet was the gyro. The tablet knows its acceleration and orientation at all times. This information can be accessed in JavaScript using a fairly new API. The two events of interest are ondevicemotion and ondeviceorientation. Using simple-httpd I can transmit all this information to Emacs as it arrives.

Instead of writing a new servlet for this, to try it out I used skewer.log(). Connect a web page viewed on the tablet to Skewer hosted on the laptop, then evaluate this in a js2-mode buffer on the laptop.

window.addEventListener('devicemotion', function(event) {
    var a = event.accelerationIncludingGravity;
    skewer.log([a.x, a.y, a.z]);
});

Or for orientation,

window.addEventListener('deviceorientation', function(event) {
    skewer.log([event.alpha, event.beta, event.gamma]);
});

These orientation values appeared in my *skewer-repl* buffer as I casually rolled the tablet on one axis. The units are obviously degrees.

[157.4155398727678, 0.38583511837777246, -44.61023992234689]
[155.4477623728871, -0.6438986350040569, -44.69645057005079]
[154.32208572596647, -0.7516393196323073, -45.79730289443301]
[155.437674183483, -0.48375529832044045, -46.406449900466015]
[156.2974174150692, 0.21938214098430556, -47.482812581579154]
[154.85869270791937, 0.11046702400456986, -48.67378583696511]
[153.3284161451347, -0.9344782009891125, -48.61755630462298]
[154.11860073021347, -0.6553947505116874, -49.949668589018074]
[155.85919247792117, 0.05473832995756562, -49.84400214746339]
[156.92487274317241, 0.4946305069438346, -49.86369016774595]
[158.06542554210534, 0.712759801803332, -49.61875275392013]
[159.356905031128, 1.3387109941852697, -49.9372717956745]

It would be neat to pump these into a 3D plot display as they come in, such that my laptop displays the current tablet orientation on the screen as I move it around, but I didn’t see any quick way to do this.

Here are some acceleration values at rest. Since I took these samples on Earth the units are obviously in meters per second per second.

[-0.009576806798577309, 0.31603461503982544, 9.816226959228516]
[-0.047884032130241394, 0.3064578175544739, 9.806650161743164]
[-0.009576806798577309, 0.28730419278144836, 9.787496566772461]
[0.009576806798577309, 0.3064578175544739, 9.816226959228516]
[-0.06703764945268631, 0.3256114423274994, 9.797073364257812]
[-0.047884032130241394, 0.2968810200691223, 9.864110946655273]
[-0.028730420395731926, 0.2968810200691223, 9.576807022094727]
[-0.019153613597154617, 0.363918662071228, 9.691728591918945]
[-0.05746084079146385, 0.3734954595565796, 10.199298858642578]

Now that I have the hardware for it, I really want to use this API to do something interesting in a web application. I just don’t have any specific ideas yet.

Prototype-based Elisp Objects with @

2013-04-07T00:00:00Z

Reflection from the future: This library is super slow and inefficient. It should probably not be used for anything serious.

Last weekend I had the itch to play around with a multiple-inheritance prototype-based object system in lisp. It would look a lot like JavaScript’s object system but wanted to try experimenting some different ideas. My favorite lisp to hack in is Emacs Lisp, so that’s what I built it on. What I ended up with is actually pretty neat. Despite the lack of reader macros in Elisp, I still managed to introduce new syntax by manipulating symbols at compile time.

https://github.com/skeeto/at-el

See the README for a quick demonstration. What follows is the long explanation.

It’s called @, due to the syntax that it adds to Elisp as a domain-specific language. It’s a mini-language, really. The name is also a challenge to the code that supports Elisp, because so much of it — including emacs-lisp-mode and Paredit — doesn’t properly handle @ in identifiers. ~~Even Maruku, the Markdown to HTML translator I use for this blog, has bugs that won’t allow it to handle the @ characters in my code, so I had to forgo most syntax highlighting for this post.~~ (Update: I now use Kramdown so this is no longer an issue.)

Fortunately require does manage just fine.

(require '@)

Objects in @ are vectors with the symbol @ as the first element. The rest of the elements are implementation specific, but, at the moment, the second element is a plist (property list) of all of that object’s properties.

The root object of @ is @, and all other objects are instances of this object, either directly or indirectly. Because it’s prototype based, creating a new object is a matter of extending one or more (multiple-inheritance) existing objects. This is done with the function @extend.

;; Create a brand new object
(defvar foo (@extend @))

If no objects are given to @extend, @ will be used as the parent object, so it’s not necessary as an argument above. This is actually very important, as objects that don’t inherit from @ will not work at all! I’ll get into that detail in a bit. Additionally, @extend accepts keyword arguments, which become properties on the created object.

The function @ is used to access properties on an object. Remember, Elisp is a lisp-2 meaning that variables and functions exist in their own namespaces. This means there can be both a variable @ (the root object) and function @ (property accessor).

(setf rectangle (@extend :width 3 :height 4))
(@ rectangle :width)  ; => 3
(@ rectangle :height)  ; => 4

The @ function is also setf-able, so setting properties should be obvious to any lisper.

(setf (@ rectangle :width) 13)
(@ rectangle :width)  ; => 13

Like JavaScript, methods are just functions stored in properties on an object. In @, the first argument for a method is the object itself, which is called @@ by convention.

(setf (@ rectangle :area)
  (lambda (@) (* (@ @@ :width) (@ @@ :height))))

(funcall (@ rectangle :area) rectangle)  ; => 52

New Syntax

Here’s the first really neat part. I find all that (@ @@ ...) business to be visually unpleasing. Fortunately this can be fixed by adding syntax. The macro def@ transforms variables that look like @: into these @ accessors. The following declaration is equivalent to the lambda assignment above. It’s meant to be very convenient.

(def@ rectangle :area ()
  (* @:width @:height))

This macro walks the body of the function at compile-time (macro expansion time) and transforms these symbols into the full @ calls above. Like most lisp macros, this has no run-time performance cost.

Because using funcall all the time and remembering to pass the object as the first argument is tedious, the @! function is provided for calling methods.

(@! rectangle :area)  ; => 52

The @: variables become function calls when in function position.

(def@ rectangle :double-area ()
  (* 2 (@:area))

In a lisp-1 this would happen for free, but in Elisp this situation expands to the @! form.

Inheritance

This rectangle is starting to look like a nice re-usable object. There’s a @ convention for this: prefix “class” object names with @.

(setf @rectangle rectangle)

Now to create new rectangle objects.

(setf foo (@extend @rectangle :width 3 :height 7.1))
(@! foo :area)  ; => 21.3

Notice that the foo object doesn’t actually have an :area property on itself. It was found on its parent, @rectangle by inheritance. :width and :height were not looked up on the parent because they’re already bound on foo.

Here’s another re-usable prototype. Notice that @: variables are also setf-able — using push in this case.

(defvar @colored (@extend :color ()))

(def@ @colored :mix (color)
  (push color @:color))

The object system has multiple-inheritance, so colored rectangles can be created from these two objects. The parent objects of an object are listed in the :proto property as a list (similar to JavaScript’s __proto__), which can be modified at any time to change an object’s prototype chain.

(defvar foo (@extend @colored @rectangle :width 10 :height 4))

(@! foo :area)  ; => 40
(@! foo :mix :red)
(@! foo :mix :blue)
(@ foo :color)  ; => (:blue :red)

Even though the initial property was read from the parent, the assignment (push), like all assignments, actually occurred on foo.

Setters and Getters

Remember how I said that objects that don’t eventually inherit from @ will be broken? This is because properties are actually set and accessed through :set and :get methods. That is, @ calls these methods as needed. The @ object provides the default actions for these. An interesting part of the @ code: initially setting :set on @ is a circularity problem, so there’s a special bootstrap step to accomplish it.

By providing your own you can fundamentally change how your object works. For example, here’s an @immutable mix-in which prevents all property assignments. It’s provided as part of @.

(defvar @immutable (@extend))

(def@ @immutable :set (property _value)
  (error "Object is immutable, cannot set %s" property))

This :set method will be found before the @ :set method, so it gets overridden.

Remember how I said all object have a :proto that can be used to modify the objects inheritance? This can be used to freeze an object’s properties in place. Here’s a :freeze method for all objects.

(def@ @ :freeze ()
  "Make this object immutable."
  (push @immutable @:proto))

Pretty cool, eh?

The :get method can be used to provide virtual properties.

(defvar @squares (@extend))

(def@ @squares :get (property)
  (if (numberp property)
      (expt property 2)
    (@^:get property)))  ; explained in a moment

(mapcar (lambda (n) (@ @squares n)) '(0 1 2 3 4))
; => (0 1 4 9 16)

I use this technique in the @vector class under lib/ to expose the elements of the internal vector as if they were properties. Brian used this trick to make a @buffer prototype that wraps Emacs’ buffers, with methods provided virtually by :get. For example, the :string property would return a lambda that calls buffer-string.

With multiple-inheritance and these setters and getters, there are a lot of interesting mix-in possibilities. I’m only just discovering some of them now.

Supermethods

Sometimes it’s really useful to call supermethods. There’s syntax for this: @^:. This calls the next method of that name in the prototype chain. For example, here’s a @watchable mix-in (also provided by @) that allows other code to be notified of changes to an object. It needs to override :set but still call the original :set.

(defvar @watchable (@extend :watchers nil))

(def@ @watchable :watch (callback)
  (push callback @:watchers))

(def@ @watchable :unwatch (callback)
  (setf @:watchers (remove callback @:watchers)))

(def@ @watchable :set (property new)
  (dolist (callback @:watchers)
    (funcall callback @@ property new))
  (@^:set property new))

This behavior is also used for constructors. By convention, the :init method is the constructor. It should generally call the next constructor with (@^:init). @ has a no-op, no-argument :init method to bottom-out this process.

(def@ @rectangle :init (width height)
  (@^:init)
  (setf @:width width @:height height))

(@! (@! @rectangle :new 13.2 2.1) :area) ; => 27.72

As shown, the :new method provided by the @ object combines both @extend and :init to provide simple single-object inheritance.

The Cost of @

In the lib/ directory there are a bunch of example objects implemented: including @vector, @queue, @stack, and @heap. I found these to be very enjoyable to write, and they’ve been the testing grounds for @. @heap uses an internal @vector instance and exercises @’s features the most.

The performance cost of @ very apparent with @heap. Even byte-compiled it’s slower than the naive implementation (compose push and sort) for even as high as 1,000 elements. While I think @ leads to elegant code, there’s still plenty to do for performance. It’s comically slow.

This really caught Brian’s interest, because it was an opportunity to put on his programming language designer’s hat — which I believe to be his favorite hat. He’s been trying different caching strategies to reduce all the walking of the prototype chain. This effort can be found in the other repository branches and in his fork. The system is so dynamic that cache invalidation is a really complex problem.

Every time a property is set, @ has to find the :set property for that object, which generally means walking all the way up to @. Because :proto can be modified at any time, every property look-up requires computing the precedence order (lazily). This all makes property assignment quite expensive! I can understand why real object systems aren’t this flexible. It comes at a high price.

Fast Monte Carlo Method with JavaScript

2013-02-25T00:00:00Z

How many times should a random number from [0, 1] be drawn to have it sum over 1?

If you want to figure it out for yourself, stop reading now and come back when you’re done.

The answer is e. When I came across this question I took the lazy programmer route and, rather than work out the math, I estimated the answer using the Monte Carlo method. I used the language I always use for these scratchpad computations: Emacs Lisp. All I need to do is switch to the *scratch* buffer and start hacking. No external program needed.

The downside is that Elisp is incredibly slow. Fortunately, Elisp is so similar to Common Lisp that porting to it is almost trivial. My preferred Common Lisp implementation, SBCL, is very, very fast so it’s a huge speed upgrade with little cost, should I need it. As far as I know, SBCL is the fastest Common Lisp implementation.

Even though Elisp was fast enough to determine that the answer is probably e, I wanted to play around with it. This little test program doubles as a way to estimate the value of e, similar to estimating pi. The more trial runs I give it the more accurate my answer will get — to a point.

Here’s the Common Lisp version. (I love the loop macro, obviously.)

(defun trial ()
  (loop for count upfrom 1
     sum (random 1.0) into total
     until (> total 1)
     finally (return count)))

(defun monte-carlo (n)
  (loop repeat n
     sum (trial) into total
     finally (return (/ total 1.0 n))))

Using SBCL 1.0.57.0.debian on an Intel Core i7-2600 CPU, once everything’s warmed up this takes about 9.4 seconds with 100 million trials.

(time (monte-carlo 100000000))
Evaluation took:
  9.423 seconds of real time
  9.388587 seconds of total run time (9.380586 user, 0.008001 system)
  99.64% CPU
  31,965,834,356 processor cycles
  99,008 bytes consed
2.7185063

Since this makes for an interesting benchmark I gave it a whirl in JavaScript,

function trial() {
    var count = 0, sum = 0;
    while (sum <= 1) {
        sum += Math.random();
        count++;
    }
    return count;
}

function monteCarlo(n) {
    var total = 0;
    for (var i = 0; i < n; i++) {
        total += trial();
    }
    return total / n;
}

I ran this on Chromium 24.0.1312.68 Debian 7.0 (180326) which uses V8, currently the fastest JavaScript engine. With 100 million trials, this only took about 2.7 seconds!

monteCarlo(100000000); // ~2.7 seconds, according to Skewer
// => 2.71850356

Whoa! It beat SBCL! I was shocked. Let’s try using C as a baseline. Surely C will be the fastest.

#include 
#include 

int trial() {
    int count = 0;
    double sum = 0;
    while (sum <= 1.0) {
        sum += rand() / (double) RAND_MAX;
        count++;
    }
    return count;
}

double monteCarlo(int n) {
    int i, total = 0;
    for (i = 0; i < n; i++) {
        total += trial();
    }
    return total / (double) n;
}

int main() {
    printf("%f\n", monteCarlo(100000000));
    return 0;
}

I used the highest optimization setting on the compiler.

$ gcc -ansi -W -Wall -Wextra -O3 temp.c
$ time ./a.out
2.718359

real	0m3.782s
user	0m3.760s
sys	0m0.000s

Incredible! JavaScript was faster than C! That was completely unexpected.

The Circumstances

Both the Common Lisp and C code could probably be carefully tweaked to improve performance. In Common Lisp’s case I could attach type information and turn down safety. For C I could use more compiler flags to squeeze out a bit more performance. Then maybe they could beat JavaScript.

In contrast, as far as I can tell the JavaScript code is already as optimized as it can get. There just aren’t many knobs to tweak. Note that minifying the code will make no difference, especially since I’m not measuring the parsing time. Except for the functions themselves, the variables are all local, so they are never “looked up” at run-time. Their name length doesn’t matter. Remember, in JavaScript global variables are expensive, because they’re (generally) hash table lookups on the global object at run-time. For any decent compiler, local variables are basically precomputed memory offsets — very fast.

The function names themselves are global variables, but the V8 compiler appears to eliminate this cost (inlining?). Wrapping the entire thing in another function, turning the two original functions into local variables, makes no difference in performance.

While Common Lisp and C may be able to beat JavaScript if time is invested in optimizing them — something to be done rarely — in a casual implementation of this algorithm, JavaScript beats them both. I find this really exciting.

How to Make an Emacs Minor Mode

2013-02-06T00:00:00Z

An Emacs buffer always has one major mode and zero or more minor modes. Major modes tend to be significant efforts, especially when it comes to automatic indentation. In contrast, minor modes are often simple, perhaps only overlaying a small keymap for additional functionality. Creating a new minor mode is really easy, it’s just a matter of understanding Emacs’ conventions.

Mode names should end in -mode and the command for toggling the mode should be the same name. They keymap for the mode should be called mode-map and the mode’s toggle hook should be called mode-hook. Keep all of this in mind when picking a name for your minor mode.

There are a number of other tedious issues that need to be taken into account when manually building a minor mode. The good news is that no one needs to worry about most of it! Lisp has macros for cutting down on boilerplate code and so there’s a macro for this very purpose: define-minor-mode. Here’s all it takes to make a new minor mode, foo-mode.

(define-minor-mode foo-mode
  "Get your foos in the right places.")

This creates a command foo-mode for toggling the minor mode and a hook called foo-mode-hook. There’s a strange caveat about the hook: it’s not immediately declared as a variable. My guess is that this is some archaic optimization which now exists as bad design. The hook function add-hook will create this variable lazily when needed and the function run-hooks will ignore hook variables that don’t yet exist, so it doesn’t get tripped up by this situation. So despite its strange initial absence, the new minor mode will use this hook as soon as functions are added to it.

Minor Mode Options

This mode doesn’t do anything yet. It doesn’t have its own keymap and it doesn’t even show up in the modeline. It’s just a toggle and a hook that’s run when the toggle is used. To add more to the mode, define-minor-mode accepts a number of keywords. Here are the important ones.

:lighter: the name, a string, to show in the modeline
:keymap: the mode’s keymap
:global: specifies if the minor mode is global

The :lighter option has one caveat: it’s concatenated to the rest of the modeline without any delimiter. This means it needs to be prefixed with a space. I think this is mistake, but we’re stuck with it probably forever. Otherwise this string should be kept short: there’s generally not much room on the modeline.

(define-minor-mode foo-mode
  "Get your foos in the right places."
  :lighter " foo")

New, empty keymaps are created with (make-keymap) or (make-sparse-keymap). The latter is more efficient when the map will contain a small number of keybindings, as is the case with most minor modes. The fact that these separate functions exist is probably another outdated, premature optimization. To avoid confusing others, I recommend you use the one that matches your intended usage.

The keymap can be provided directly to :keymap and it will be bound to foo-mode-map automatically. I could just put an empty keymap here and define keys separately outside the define-minor-mode declaration, but I like the idea of creating the whole map in one expression.

(defun insert-foo ()
  (interactive)
  (insert "foo"))

(define-minor-mode foo-mode
  "Get your foos in the right places."
  :lighter " foo"
  :keymap (let ((map (make-sparse-keymap)))
            (define-key map (kbd "C-c f") 'insert-foo)
            map))

The :global option means the minor mode is not local to a buffer, it’s present everywhere. As far as I know, the only global minor mode I’ve ever used is YASnippet.

Minor Mode Body

The rest of define-minor-mode is a body for arbitrary Lisp, like a defun. It’s run every time the mode is toggled off or on, so it’s like a built-in hook function. Use it to do any sort of special setup or teardown, such hooking or unhooking Emacs’ hooks. A likely thing to be done in here is specifying buffer-local variables.

Any time the Emacs interpreter is evaluating an expression there’s always a current buffer acting as context. Many functions that operate on buffers don’t actually accept a buffer as an argument. Instead they operate on the current buffer. Furthermore, some variables are buffer-local: the binding is dynamic over the current buffer. This is useful for maintaining state relevant only to a particular buffer.

Side note: the with-current-buffer macro is used to specify a different current buffer for a body of code. It can be used to access other buffer’s local variables. Similarly, with-temp-buffer creates a brand new buffer, uses it as the current buffer for its body, and then destroys the buffer.

For example, let’s say I want to keep track of how many times foo-mode inserted “foo” into the current buffer.

(defvar foo-count 0
  "Number of foos inserted into the current buffer.")

(defun insert-foo ()
  (interactive)
  (setq foo-count (1+ foo-count))
  (insert "foo"))

(define-minor-mode foo-mode
  "Get your foos in the right places."
  :lighter " foo"
  :keymap (let ((map (make-sparse-keymap)))
            (define-key map (kbd "C-c f") 'insert-foo)
            map)
  (make-local-variable 'foo-count))

The built-in function make-local-variable creates a new buffer-local version of a global variable in the current buffer. Here, the buffer-local foo-count will be initialized with the value 0 from the global variable but all reassignments will only be visible in the current buffer.

However, in this case it may be better to use make-variable-buffer-local on the global variable and skip the make-local-variable. The main reason is that I don’t want insert-foo to clobber the global variable if it happens to be used in a buffer that doesn’t have the minor mode enabled.

(make-variable-buffer-local
 (defvar foo-count 0
   "Number of foos inserted into the current buffer."))

A big advantage is that this buffer-local intention for the variable is documented globally. This message will appear in the variable’s documentation.

Automatically becomes buffer-local when set in any fashion.

Which method you use is up to your personal preference. The Emacs documentation encourages the former but I think the latter is nicer in many situations.

Automatically Enabling the Minor Mode

Some minor modes don’t have any particular major mode association and the user will toggle it at will. Some minor modes only make sense when used with particular major mode and it might make sense to automatically enable along with that mode. This is done by hooking that major mode’s hook. So long as the mode follows Emacs’ conventions as mentioned at the top, this hook should be easy to find.

(add-hook 'text-mode-hook 'foo-mode)

Here, foo-mode will automatically be activated in all text-mode buffers.

Full Code

Here’s the final code for our minor mode, saved to foo-mode.el. It has one keybinding and it’s easily open for users to define more keys in foo-mode-map. It also automatically activates when the user is editing a plain text file.

(make-variable-buffer-local
 (defvar foo-count 0
   "Number of foos inserted into the current buffer."))

(defun insert-foo ()
  (interactive)
  (setq foo-count (1+ foo-count))
  (insert "foo"))

;;;###autoload
(define-minor-mode foo-mode
  "Get your foos in the right places."
  :lighter " foo"
  :keymap (let ((map (make-sparse-keymap)))
            (define-key map (kbd "C-c f") 'insert-foo)
            map))

;;;###autoload
(add-hook 'text-mode-hook 'foo-mode)

(provide 'foo-mode)

I added some autoload declarations and a provide in case this mode is ever distributed or used as a package. If an autoloads script is generated for this minor mode, a temporary function called foo-mode will be defined whose sole purpose is to load the real foo-mode.el and then call foo-mode again with its new definition, which was loaded overtop the temporary definition.

The autoloads script also adds this temporary foo-mode function to the text-mode-hook. If a text-mode buffer is created, the hook will call foo-mode which will load foo-mode.el, redefining foo-mode to its real definition, then activate foo-mode.

The point of autoloads is to defer loading code until it’s needed. You may notice this as a short delay the first time you activate a mode after starting Emacs. This is what keeps Emacs’ start time reasonable despite having millions of lines of Elisp virtually loaded at startup.

Emacs Javadoc Lookups Get a Facelift

2013-01-30T00:00:00Z

Ever since I started using the Emacs package archive, specifically MELPA, I’d been wanting to tidy up my Emacs Java extensions, java-mode-plus, into a nice, official package. Observing my own attitude after the switch, I noticed that if a package isn’t available on ELPA or MELPA, it practically doesn’t exist for me. Manually installing anything now seems like so much trouble in comparison, and getting a package on MELPA is so easy that there’s little excuse for package authors not to have their package in at least one of the three major Elisp archives. This is exactly the attitude my own un-archived package would be facing from other people, and rightfully so.

Before I dive in, this is what the user configuration now looks like,

(javadoc-add-artifacts [org.lwjgl.lwjg lwjgl "2.8.2"]
                       [com.nullprogram native-guide "0.2"]
                       [org.apache.commons commons-math3 "3.0"])

That’s right: it knows how to find, fetch, and index documentation on its own. Keep reading if this sounds useful to you.

The Problem

The problem was that java-mode-plus was doing two unrelated things:

Supporting Ant-oriented Java projects. Not being a fan of Maven, I’ve used Ant for all of my own personal projects. (However, I really do like the Maven infrastructure, so I use Apache Ivy.) It seems Maven is a lot more popular, so this part isn’t useful for many people.
Quick Javadoc referencing, which I was calling java-docs. I think this is generally useful for anyone writing Java in Emacs, even if they’re using another suite like JDEE or writing in another JVM language. It would be nice for people to be able to use this without pulling in all of java-mode-plus — which was somewhat intrusive.

I also didn’t like the names I had picked. java-mode-plus wasn’t even a mode until recently and its name isn’t conventional. And “java-docs” is just stupid. I recently solved all this by splitting the java-mode-plus into two new packages,

ant-project-mode — A minor mode that performs the duties of the first task above. Since I’ve phased Java out from my own personal projects and no longer intend to write Java anymore, this part isn’t very useful to me personally at the moment. If I do need to write Java for work again I’ll probably dust this off. It’s by no means un-maintained, it’s just in maintenance mode for now. Because of this, this is not in any Emacs package archive
javadoc-lookup — This is java-docs renamed and with some new goodies! I also put this on MELPA, where it’s easy for anyone to use. This is continues to be useful for me as I use Clojure.

javadoc-lookup

This is used like java-docs before it, just under a different name. The function javadoc-lookup asks for a Java class for documentation. I like to bind this to C-h j.

The function javadoc-add-roots provides filesystem paths to be indexed for lookup.

(javadoc-add-roots "/usr/share/doc/openjdk-6-jdk/api"
                   "~/src/project/doc")

Also, as before, if you don’t provide a root for the core Java API, it will automatically load an index of the official Javadoc hosted online. This means it can be installed from MELPA and used immediately without any configuration. Good defaults and minimal required configuration is something I highly value.

Back in the java-docs days, when I started using a new library I’d track down the Javadoc jar, unzip it somewhere on my machine, and add it to be indexed. I regularly do development on four different computers, so this gets tedious fast. Since the Javadoc jars are easily available from the Maven repository, I maintained a small Ant project within my .emacs.d for awhile just to do this fetching, but it was a dirty hack.

Finally, the Goodies

Here’s the cool new part: I built this functionality into javadoc-lookup. It can fetch all your documentation for you! Instead of providing a path on your filesystem, you name an artifact that Maven can find. javadoc-lookup will call Maven to fetch the Javadoc jar, unzip it into a cache directory, and index it for lookups. You will need Maven installed either on your $PATH or at maven-program-name (Elisp variable).

Here’s a sample configuration. It’s group, artifact, version provided as a sequence. I say “sequence” because it can be either a list or a vector and those names can be either strings or symbols. I prefer the vector/symbol method because it requires the least quoting, plus it looks Clojure-ish.

(javadoc-add-artifacts [org.lwjgl.lwjg lwjgl "2.8.2"]
                       [com.nullprogram native-guide "0.2"]
                       [org.apache.commons commons-math3 "3.0"])

Put that in your initialization and all this documentation will appear in the lookup index. It only needs to fetch from Maven once per artifact per system — a very very slow process. After that it operates entirely from its own cache which is very fast, so it won’t slow down your startup.

This has been extremely convenient for me so I hope other people find it useful, too.

As a final note, javadoc-lookup also exploits structural sharing in its tables, using a lot less memory than java-docs. Not that it was a problem before; it’s a feel-good feature.

Live CSS Interaction with Skewer

2013-01-24T00:00:00Z

This evening Skewer gained support for live CSS. When editing CSS code, you can send your rules and declarations from the editing buffer to be applied in the open page in the browser. It makes experimenting with CSS really, really easy. The functionality is exposed through the familiar interaction keybindings, so if you’re already familiar with other Emacs interaction modes (SLIME, nREPL, Skewer, Geiser, Emacs Lisp), this should feel right at home.

To provide the keybindings in css-mode there’s a new minor mode, skewer-css-mode. CSS “expressions” are sent to the browser through the communication channel already provided by Skewer. It’s essentially an extension to Skewer: it could have been created without making any changes to Skewer itself.

Unfortunately Emacs’ css-mode is nowhere near as sophisticated as js2-mode — which reads in and exposes a full JavaScript AST. I had to write my own very primitive CSS parsing routines to tease things apart. It should generally be able to parse declarations and rules reasonably no matter how it’s indented, but it’s not very good at navigating around comments, especially when they contain CSS syntax. If I find a way to parse CSS more easily sometime I’ll see about fixing it, but it’s plenty good enough for now.

To “evaluate” the CSS, the code is simply dropped into the page as a new

Articles tagged emacs at null program

A Makefile for Emacs Packages

Complex packages

Dependencies

Efficient Alias of a Built-In Emacs Lisp Function

defalias

defsubst

cl-first

Benchmark

On-the-fly Linear Congruential Generator Using Emacs Calc

Crafting a PRNG

Primes from Emacs Calc

Appending permutation

Bonus: Adapting to other languages

UTF-8 String Indexing Strategies

Emacs Lisp

Julia

Go

Preferences

An Async / Await Library for Emacs Lisp

aio example

Promises, simplified

Evaluate in the context of a promise

Async functions

Composing promises

Threads

Processes

Testing aio

Async/await is pretty awesome

Emacs 26 Brings Generators and Threads

Generators

Threads

Building generators on threads

The future of threads

Emacs Lisp Lambda Expressions Are Not Self-Evaluating

Taming an old dragon

eval-after-load

A workaround

Evaluating function objects

Solving the problem with one character

Options for Structured Data in Emacs Lisp

Defining a data structure from scratch

Two pitfalls

Dynamic dispatch

Debugging Emacs or: How I Learned to Stop Worrying and Love DTrace

Bryan Cantrill, DTrace, and FreeBSD

What is DTrace?

FreeBSD on a Raspberry Pi 2

What's in an Emacs Lambda

Lambda under byte compilation

Lambda in lexical scope

Overcapturing

How byte compiled closures are constructed

Make Flet Great Again

cl-lib and cl-flet

Unit testing with flet

cl-letf

Gap Buffers Are Not Optimized for Multiple Cursors

Multiple cursors

Vim vs. Emacs: the Working Directory

Single instance editors

Invoking the build

My Journey with Touch Typing and Vim

Touch typing

Modal editing

Future directions

Asynchronous Requests from Emacs Dynamic Modules

SIGUSR1

Making it work on Windows

How to Write Fast(er) Emacs Lisp

(1) Use lexical scope

(2) Prefer built-in functions

(3) Avoid unnecessary lambda functions

(4) Prefer using functions with dedicated opcodes

(5) Unroll loops using and/or

How can I find more optimization opportunities myself?

Domain-Specific Language Compilation in Elfeed

Heuristic parsing

A domain-specific language

Optimizing the domain-specific language