nullprogram.com/blog/2013/12/30/
I’ve stated before that one of the unique features of Emacs Lisp is
that its closures are readable. Closures can be serialized by the
printer and read back in with the reader. I am unaware of any other
programming language that has this feature. In fact it’s essential for
Elisp byte-code compilation because byte-compiled Elisp files are
merely s-expressions of byte-code dumped out as source.
Lisp Printing
The Lisp family of languages are homoiconic. Lisp source code is
written in the syntax of its own data structures, s-expressions. Since
a compiler/interpreter is usually provided at run-time, a consequence
of this is that reading and printing are a fundamental feature of
Lisps. A value can be handed to the printer, which will serialize the
value into an s-expression as a sequence of characters. Later on the
reader can parse the s-expression back into an equal
value.
To compare, JavaScript originally had half of this in place.
JavaScript has convenient object syntax for defining an associative
array, known today as JSON. The eval
function could (dangerously) be
used as a reader for parsing a string containing JSON-encoded data
into a value. But until JSON.stringify()
became standard, developers
had to write their own printer. Lisp s-expression syntax is much more
powerful (and complicated) than JSON, maintaining
both identity and cycles (e.g. *print-circle*
).
Not all values can be read. They’ll still print (when *print-readably*
is nil) but will do so using special syntax that will signal an error
in the reader: #<
. For example, in Emacs Lisp buffers cannot be
serialized so they print using this syntax.
(prin1-to-string (current-buffer))
;; => "#<buffer *scratch*>"
It doesn’t matter what’s between the angle brackets, or even that
there’s a closing angle bracket. The reader will signal an error as
soon as it hits a #<
.
Almost Everything Prints Readably
Elisp has a small set of primitive data types. All of these primitive
types print readably:
- integer (
1024
, ?a
)
- float (
1.7
)
- cons/list (
(...)
)
- vector (one-dimensional,
[...]
)
- bool-vector (
#&n"..."
)
- string (
"..."
)
- char-table (
#^[...]
)
- hash-table (readable as of Emacs 23.3,
#s(hash-table ...)
)
- byte-code function object (
#[...]
)
- symbol
Here are all the non-readable types. Each one has a good reason for
not being serializable.
- buffer
- process (external state)
- frame (user interface element)
- marker (live, automatically updates)
- overlay (belongs to a buffer)
- built-in functions (native code)
- user-ptr (opaque pointers from Emacs 25 dynamic modules)
And that’s it. Every other value in Elisp is constructed from one or
more of these primitives, including keymaps, functions, macros, syntax
tables, defstruct
structs, and EIEIO objects. This means that as
long as these values don’t refer to an unreadable value, they
themselves can be printed.
An interesting note here is that, unlike the Common Lisp Object System
(CLOS), EIEIO objects are readable by default. To Elisp they’re just
vectors, so of course they print. CLOS objects are unreadable without
manually defining a print method per class.
Elisp Closures
Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now
one of the relatively few languages to have both dynamic and lexical
scope. Like Common Lisp, variables declared with defvar
(and family)
continue to have dynamic scope. For backwards compatibility with old
Lisp code, lexical scope is disabled by default. It’s enabled for a
specific file or buffer by setting lexical-binding
to non-nil.
With lexical scope, anonymous functions become closures, a powerful
functional programming primitive: a function plus a captured lexical
environment. It also provides some performance benefits. In my own
tests, compiled Elisp with lexical scope enabled is about 10% to 15%
faster than with the default dynamic scope.
What do closures look like in Emacs Lisp? It takes on two forms
depending on whether the closure is compiled or not. For example,
consider this function, foo
, that takes two arguments and returns a
closure that returns the first argument.
;; -*- lexical-binding: t; -*-
(defun foo (x y)
(lambda () x))
(foo :bar :ignored)
;; => (closure ((y . :ignored) (x . :bar) t) () x)
An uncompiled closure is a list beginning with the symbol closure
.
The second element is the lexical environment, the third is the
argument list (lambda list), and the rest is the body of the function.
Here we can see that both x
and y
have been “closed over.” This is
a little bit sloppy because the function never makes use of y
.
Capturing it has a few problems.
- The closure has a larger footprint than necessary.
- Values are held longer than necessary, delaying collection.
- It affects the readability of the closure, which I’ll get to later.
Fortunately the compiler is smart enough to see this and will avoid
capturing unused variables. To prove this, I’ve now compiled foo
so
that it returns a compiled closure.
(foo :bar :ignored)
;; => #[0 "\300\207" [:bar] 1]
What’s returned here is a byte-code function object, with the #[...]
syntax. It has these elements:
- The function’s lambda list (zero arguments)
- Byte-codes stored in a unibyte string
- Constants vector
- Maximum stack space needed by this function
Notice that the lexical environment has been captured in the constants
vector, specifically noting the lack of :ignored
in this vector. The
compiler didn’t capture it.
For those curious about the byte-code here’s an explanation. The
string syntax shown is in octal, representing a string containing two
bytes: 192 and 135. The
Elisp byte-code interpreter is stack-based. The 192
(constant 0
) says to push the first constant onto the stack. The 135
(return
) says to pop the top element from the stack and return it.
(coerce "\300\207" 'list)
;; => (192 135)
The Readable Closures Catch
Since closures are byte-code function objects, they print readably.
You can capture an environment in a closure, serialize it, read it
back in, and evaluate it. That’s pretty cool! This means closures can
be transmitted to other Emacs instances in a multi-processing setup
(i.e. Elnode, Async)
The catch is that it’s easy to accidentally capture an unreadable
value, especially buffers. Consider this function bar
which uses a
temporary buffer as an efficient string builder. It returns a closure
that returns the result. (Weird, but stick with me here!)
(defun bar (n)
(with-temp-buffer
(let ((standard-output (current-buffer)))
(loop for i from 0 to n do (princ i))
(let ((string (buffer-string)))
(lambda () string)))))
The compiled form looks fine,
(bar 3)
;; => #[0 "\300\207" ["0123"] 1]
But the interpreted form of the closure has a problem. The
with-temp-buffer
macro silently introduced a new binding — an
abstraction leak.
(bar 3)
;; => (closure ((string . "0123")
;; (temp-buffer . #<killed buffer>)
;; (n . 3) t)
;; () string)
The temporary buffer is mistakenly captured in the closure making it
unreadable, but only in its uncompiled form. This creates the
awkward situation where compiled and uncompiled code has different
behavior.