nullprogram.com/blog/2018/04/13/
My first exposure to C and C++ was a little over 20 years ago. I
remember it being some version of Borland C++, either 4.x
or 5.x, running on Windows 95. I didn’t have a mentor, so I
did the best I could slowly working through what was probably a poorly
written beginner C++ book, typing out the examples and exercises with
little understanding. Since I didn’t learn much from the experience,
there was a 7 or 8 year gap before I’d revisit C and C++ in college.

I thought it would be interesting to revisit this software, to
reevaluate it from a far more experienced perspective. Keep in mind
that C++ wasn’t even standardized yet, and the most recent C standard
was from 1989. Given this, what was it like to be a professional
software developer using a Borland toolchain on Windows 20 years ago?
Was it miserable, made bearable only by ignorance of how much better
the tooling could be? Or maybe it actually wasn’t so bad, and these
tools are better than I expect?
Ultimately my conclusion is that it’s a little bit of both. There are
some significant capability gaps compared to today, but the core
toolchain itself is actually quite reasonable, especially for the mid
1990s.
The setup
Before getting into the evaluation, let’s discuss how I got it all up
and running. While it’s technically possible to run Windows 95 on a
modern x86-64 machine thanks to the architecture’s extreme backwards
compatibility, it’s more compatible, simpler, and safer to
virtualize it. Most importantly, I can emulate older hardware that
will have better driver support.
Despite that early start in Windows all those years ago, I’m primarily
a Linux user. The premier virtualization solution on Linux these days
is KVM, a kernel module that turns Linux into a hypervisor and
makes efficient use of hardware virtualization extensions.
Unfortunately pre-XP Windows doesn’t work well on KVM, so instead I’m
using QEmu (with KVM disabled), a hardware emulator closely
associated with KVM. Since it doesn’t take advantage of hardware
virtualization extensions, it will be slower. This is fine since my
goal is to emulate slow, 20+ year old hardware anyway.
There’s very little practical difference between Windows 95 and
Windows 98. Since Windows 98 runs a lot smoother virtualized, I
decided to go with that instead. This will be perfectly sufficient for
my toolchain evaluation.
Software
To get started, I’ll need an installer for Windows 98. I thought this
would be difficult to find, but there’s a copy available on the
Internet Archive. I don’t know how “legitimate” this is, but it works.
Since it’s running in a virtual machine without network access, I also
don’t really care if this copy is somehow infected with malware.
Internet Archive: Windows 98 Second Edition
Also on the Internet Archive is a complete copy of Borland C++ 5.02,
with the same caveats of legitimacy. It works, which is good enough for
my purposes.
Internet Archive: Borland C++ 5.02
Thank you Internet Archive!
Hardware
I’ve got my software, now to set up the virtualized hardware. First I
create a drive image:
$ qemu-image create -fqcow2 win98.img 8G
I gave it 8GB, which is actually a bit overkill. Giving Windows 98 a
virtual hard drive with modern sizes would probably break the
installer. This sort of issue is a common theme among old software,
where there may be complaints about negative available disk space due
to signed integer overflow.
I decided to give the machine 256MB of memory (-m 256
). This is also a
little excessive, but I wanted to be sure memory didn’t limit Borland’s
capabilities. This amount of memory is close to the upper bound, and
going much beyond will likely cause problems with Windows 98.
For the CPU I settled on a Pentium (-cpu pentium
). My original goal
was to go a little simpler with a 486 (-cpu 486
), but the Windows 98
installer kept crashing when I tried this.
I experimented with different configurations for the network card, but
I couldn’t get anything to work. So I’ve disabled networking (-net
none
). The only reason I’d want this is that it would be easier to
move files in and out of the virtual machine.
Finally, here’s how I ran QEmu. The last two lines are only needed when
installing.
$ qemu-system-x86_64 \
-localtime \
-cpu pentium \
-no-acpi \
-no-hpet \
-m 256 \
-hda win98.img \
-soundhw sb16 \
-vga cirrus \
-net none \
-cdrom "Windows 98 Second Edition.iso" \
-boot d

Installation
Installation is just a matter of following the instructions. You’ll
need that product key listed on the Internet Archive site.

That copy of Borland is just a big .zip file. This presents two
problems.
-
Without network access, I’ll need to figure out how to get this
inside the virtual machine.
-
This version of Windows doesn’t come with software to unzip this
file. I’d need to find and install an unzip tool first.
Fortunately I can kill two birds with one stone by converting that .zip
archive into a .iso and mounting it in the virtual machine.
unzip "BORLAND C++.zip"
genisoimage -R -J -o borland.iso "BORLAND C++"
Then in the QEmu console (C-A-2) I attach it:
change ide1-cd0 borland.iso
This little trick of generating .iso files and mounting them is how I
will be moving all the other files into the virtual machine.
Borland C++
The first thing I did was play around with with Borland IDE. This is
what I would have been using 20 years ago.

Despite being Borland C++, I’m personally most interested in its ANSI
C compiler. As I already pointed out, this software pre-dates C++’s
standardization, and a lot has changed over the past two decades. On the
other hand, C hasn’t really changed all that much. The 1999 update to
the C standard (e.g. “C99”) was big and important, but otherwise little
has changed. The biggest drawback is the lack of “declare anywhere”
variables, including in for-loop initializers. Otherwise it’s the same
as writing C today.
To test drive the IDE, I made a couple of test projects, built and ran
them with different options, and poked around with the debugger. The
debugger is actually pretty decent, especially for the 1990s. It can be
operated via the IDE or standalone, so I could use it without firing up
the IDE and making a project.
The toolchain includes an assembler, and I can inspect the compiler’s
assembly output. To nobody’s surprise this is Intel-flavored assembly,
which is very welcome. Imagining myself as a software developer
in the mid 1990s, this means I can see exactly what the compiler’s doing
as well as write some of the performance sensitive parts in assembly if
necessary.
The built-in editor is the worst part of the IDE, which is unfortunate
since it really spoils the whole experience. It’s easy to jump between
warnings and errors, it has incremental search, and it has good syntax
highlighting. But these are the only positive things I can say about it.
If I had to work with this editor full-time, I’d spend my days pretty
irritated.
Like with the debugger, the Borland people did a good job modularizing
their development tools. As part of the installation process, all of the
Borland command line tools are added to the system PATH
(reminder:
this is a single-user system). This includes compiler, linker,
assembler, debugger, and even an incomplete implementation of
make
.
With this, I can essentially pretend the IDE doesn’t exist and replace
that crummy editor with something better: Vim.
The last version of Vim to support MS-DOS and Windows 95/98 is Vim 7.3,
released in 2010. I download those binaries, trim a few things from my
.vimrc, and smuggle it all into my virtual machine via a
virtual CD. I’ve now got a powerful text editor in Windows 98 and my
situation has drastically improved.

Since I hardly use features added since Vim 7.3, this feels right at
home to me. I can invoke the build from Vim, and it
can populate the quickfix list from Borland’s output, so I could
actually be fairly productive in these circumstances! I’m honestly
really impressed with how well this all works together.
At this point I only have two significant annoyances:
-
Borland’s command line tools belong to that category of irritating
programs that print their version banner on every invocation.
There’s not even a command line switch to turn this off. All this
noise is quickly tiresome. The Visual Studio toolchain does
the same thing by default, though it can be turned off (-nologo
).
I dislike that some GNU tools also commit this sin, but at least
GNU limits this to interactive programs.
-
The Windows/DOS command shell and console is even worse than it
is today. I didn’t think that was possible. This is back when
it was still genuinely DOS and not just pretending to be (e.g. in
NT). The worst part by far is the lack of command history. There’s
no using the up-arrow to get previous commands. There’s no tab
completion. Forward slash is not a substitute for backslash in
paths. If I wanted to improve my productivity, replacing this
console and shell would be the first priority.
Update: In an email, Aristotle Pagaltzis informed me that Windows 98
comes with DOSKEY.COM, which provides command history for
COMMAND.EXE. Alternatively there’s Enhanced DOSKEY.com, an
open source, alternative implementation that also provides tab
completion for commands and filesnames. This makes the console a lot
more usable (and, honestly, in some ways better than the modern
defaults).
Building Enchive with Borland
Last year I wrote a backup encryption tool called Enchive,
and I still use it regularly. One of my design goals was high
portability since it may be needed to decrypt something important in
the distant future. It should be as bit-rot-proof as
possible. In software, the best way to future-proof is to
past-proof.
If I had a time machine that could send source code back in time, and
I sent Enchive to a competant developer 20 years ago, would they be
able to compile it and run it? If the answer is yes, then that means
Enchive already has 20 years of future-proofing built into it.
To accomplish this, Enchive is 3,300 lines of strict ANSI C,
1989-style, with no dependencies other than the C standard library and
a handful of operating system functions — e.g. functionality not in
the C standard library. In practice, any ANSI C compiler targeting
either POSIX, or Windows 95 or later, should be able to compile it.
My Windows 98 virtual machine includes an ANSI C compiler, and can be
used to simulate this time machine. I generated an “amalgamation” build
(make amalgamation
) — essentially a concatenation of all the source
files — and sent this into the virtual machine. Before Borland was able
to compile it, I needed to make three small changes.
First, Enchive includes stdint.h
to get fixed-width integers needed
for the encryption routines. This header comes from C99, and C89 has
no equivalent. I anticipated this problem from the beginning and made
it easy for the person performing the build to correct it. This header
is included exactly once, in config.h
, and this is placed at the top
of the amalgamation build. The include only needs to be replaced with
a handful of manual typedefs. For Borland that looks like this:
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;
typedef unsigned __int64 uint64_t;
typedef long int32_t;
typedef __int64 int64_t;
#define INT8_C(n) (n)
#define INT16_C(n) (n)
#define INT32_C(n) (n##U)
Second, in more recent versions of Windows, GetFileAttributes()
can
return the value INVALID_FILE_ATTRIBUTES
. Checking for an error that
cannot happen is harmless, but this value isn’t defined in Borland’s
SDK. I only had to eliminate that check.
Third, the CryptGenRandom()
interface isn’t defined in
Borland’s SDK. This is used by Enchive to generate keys. MSDN reports
this function wasn’t available until Windows XP, but it’s definitely
there in Windows 98, exported by ADVAPI32.dll. I’m able to call it,
though it always reports an error. Perhaps it’s been disabled in this
version due to cryptographic export restrictions?
Regardless of what’s wrong, I ripped this out and replaced it with a
fatal error. This version of Enchive can’t generate new keys — unless
derived from a passphrase — nor encrypt files, including the use of a
protection key to encrypt the secret key. However, it can decrypt
files, which is the important part that needs to be future-proofed.
With this three changes — which took me about 10 minutes to sort out —
Enchive builds and runs, and it correctly decrypts files I encrypted on
Linux. So Enchive has at least 20 years of past-proofing! The
screenshot at the top of this article shows it running successfully in
an MS-DOS console window.
What’s wrong? What’s missing?
I mentioned that there were some gaps. The most obvious is the lack of
the standard POSIX utilities, especially a decent shell. I don’t know if
any had been ported to Windows in the mid 1990s. But that could be
solved one way or another without too much trouble, even if it meant
doing some of that myself.
No, the biggest capability I’d miss, and which wouldn’t be easily
obtained, is Git, or a least a decent source control system. I really
don’t want to work without proper source control. Git’s support for
Windows is second tier, and the port to modern Windows is already a
bit of a hack. Getting it to run in Windows 98 would probably be a
challenge, especially if I had to compile it with Borland.
The other major issue is the lack of stability. In this experiment, I’ve
been seeing this screen a lot:

I remember Windows crashing a lot back in those days, and it certainly
had a bad reputation for being unstable, but this is far worse than I
remembered. While the hardware emulator may be somewhat at fault here,
keep in mind that I never installed third party drivers. Most of these
crashes are Windows’ fault. I found I can reliably bring the whole
system down with a single GetProcAddress()
call on a system DLL. The
only way I can imagine this instability was so tolerated back then was
general ignorance that computing could be so much better.
I was tempted to write this article in Vim on Windows 98, but all this
crashing made me too nervous. I didn’t want some stupid filesystem
corruption to wipe out my work. Too risky.
A better alternative
If I was stuck working in Windows 98 — or was at least targeting it as a
platform — but had access to a modern tooling ecosystem, could I do
better than Borland? Yes! Programs built by Mingw-w64 can be
run even as far back as Windows 95.
Now, there’s a catch. I thought it would be this simple:
$ i686-w64-mingw32-gcc -Os hello.c
But when I brought the resulting binary into the virtual machine it
crashed when ran it: illegal instruction. Turns out it contained a
conditional move (cmov
) which is an instruction not available until
the Pentium Pro (686). The “pentium” emulation is just a 586.
I tried to disable cmov
by picking the specific architecture:
$ i686-w64-mingw32-gcc -march=pentium -Os hello.c
This still didn’t work because the statically-linked part of the CRT
contained the cmov
. I’d have to recompile that as well.
I could have switched the QEmu options to “upgrade” to a Pentium Pro,
but remember that my goal was really the 486. Fortunately this was easy
to fix: compile my own Mingw-w64 cross-compiler. I’ve done this a number
of times before, so I knew it wouldn’t be difficult.
I could go step by step, but it’s all fairly well documented in the
Mingw-64 “howto-build” document. I used GCC 7.3 (the latest version),
and for the target I picked “i486-w64-mingw32”. When it was done I could
compile binaries on Linux to run in my Windows 98 virtual machine:
$ i486-w64-mingw32-gcc -Os hello.c
This should enable quite a bit of modern software to run inside my
virtual machine if I so wanted. I didn’t actually try this (yet?),
but, to take this concept all the way, I could use this cross-compiler
to cross-compile Mingw-w64 itself to run inside the virtual machine,
directly replacing Borland C++.
And the only thing I’d miss about Borland is its debugger.
nullprogram.com/blog/2018/03/27/
For the past couple of months I’ve been using a custom package manager
to manage a handful of software packages within various unix-like
environments. Packages are installed in my home directory under
~/.local/bin
, and the package manager itself is just a 110 line Bourne
shell script. It’s is not intended to replace the system’s package
manager but, instead, compliment it in some cases where I need more
flexibility. I use it to run custom versions of specific pieces of
software — newer or older than the system-installed versions, or with my
own patches and modifications — without interfering with the rest of
system, and without a need for root access. It’s worked out really
well so far and I expect to continue making heavy use of it in the
future.
It’s so simple that I haven’t even bothered putting the script in its
own repository. It sits unadorned within my dotfiles repository with the
name qpkg (“quick package”):
Sitting alongside my dotfiles means it’s always there when I need it,
just as if it was a built-in command.
I say it’s crude because its “install” (-I
) procedure is little more
than a wrapper around tar. It doesn’t invoke libtool after installing a
library, and there’s no post-install script — or postinst
as Debian
calls it. It doesn’t check for conflicts between packages, though
there’s a command for doing so manually ahead of time. It doesn’t manage
dependencies, nor even have them as a concept. That’s all on the user to
screw up.
In other words, it doesn’t attempt solve most of the hard problems
tackled by package managers… except for three important issues:
-
It provides a clean, guaranteed-to-work uninstall procedure. Some
Makefiles do have a token “uninstall” target, but it’s often
unreliable.
-
Unlike blindly using a Makefile “install” target, I can check for
conflicts before installing the software. I’ll know if and how a
package clobbers an already-installed package, and I can manage, or
ignore, that conflict manually as needed.
-
It produces a compact, reusable package file that I can reinstall
later, even on a different machine (with a couple of caveats). I
don’t need to keep around the original source and build directories
should I want to install or uninstall later. I can also rapidly
switch back and forth between different builds of the same software.
The first caveat is that the package will be configured for exactly my
own home directory, so I usually can’t share it with other users, or
install it on machines where I have a different home directory. Though I
could still create packages for different installation prefixes.
The second caveat is that some builds tailor themselves by default to
the host (e.g. -march=native
). If care isn’t taken, those packages may
not be very portable. This is more common than I had expected and has
mildly annoyed me.
Birth of a package manager
While the package manager is new, I’ve been building and installing
software in my home directory for years. I’d follow the normal process
of setting the install prefix to $HOME/.local
, running the build,
and then letting the “install” target do its thing.
$ tar xzf name-version.tar.gz
$ cd name-version/
$ ./configure --prefix=$HOME/.local
$ make -j$(nproc)
$ make install
This worked well enough for years. However, I’ve come to rely a lot on
this technique, and I’m using it for increasingly sophisticated
purposes, such as building custom cross-compiler toolchains.
A common difficulty has been handling the release of new versions of
software. I’d like to upgrade to the new version, but lack a way to
cleanly uninstall the previous version. Simply clobbering the old
version by installing it on top usually works. Occasionally it
wouldn’t, and I’d have to blow away ~/.local
and start all over again.
With more and more software installed in my home directory, restarting
has become more and more of a chore that I’d like to avoid.
What I needed was a way to track exactly which files were installed so
that I could remove them later when I needed to uninstall. Fortunately
there’s a widely-used convention for exactly this purpose: DESTDIR
.
It’s expected that when a Makefile provides an “install” target, it
prefixes the installation path with the DESTDIR
macro, which is
assigned to the empty string by default. This allows the user to install
the software to a temporary location for the purposes of packaging.
Unlike the installation prefix (--prefix
) configured before the build
began, the software is not expected to function properly when run in the
DESTDIR
location.
$ DESTDIR=_destdir
$ mkdir $DESTDIR
$ make DESTDIR=destdir install
A different tool will used to copy these files into place and actually
install it. This tool can track what files were installed, allowing them
to be removed later when uninstalling. My package manager uses the tar
program for both purposes. First it creates a package by packing up the
DESTDIR
(at the root of the actual install prefix):
$ tar czf package.tgz -C $DESTDIR$HOME/.local .
So a package is nothing more than a gzipped tarball. To install, it
unpacks the tarball in ~/.local
.
$ cd $HOME/.local
$ tar xzf ~/package.tgz
But how does it uninstall a package? It didn’t keep track of what was
installed. Easy! The tarball itself contains the package list, and it’s
printed with tar’s t
mode.
cd $HOME/.local
for file in $(tar tzf package.tgz | grep -v '/$'); do
rm -f "$file"
done
I’m using grep
to skip directories, which are conveniently listed with
a trailing slash. Note that in the example above, there are a couple of
issues with file names containing whitespace. If the file contains a
space character, it will word split incorrectly in the for
loop. A
Makefile couldn’t handle such a file in the first place, but, in case
it’s still necessary, my package manager sets IFS
to just a newline.
If the file name contains a newline, then my package manager relies on
a cosmic ray striking just the right bit at just the right
instant to make it all work out, because no version of tar can
unambiguously print such file names. Crossing your fingers during this
process may help.
Commands
There are five commands, each assigned to a capital letter: -B
, -C
,
-I
, -V
, and -U
. It’s an interface pattern inspired by Ted
Unangst’s signify (see signify(1)
). I also used this
pattern with Blowpipe and, in retrospect, wish I had also used
with Enchive.
Build (-B
)
Unlike the other three commands, the “build” command isn’t essential,
and is just for convenience. It assumes the build uses an Autoconfg-like
configure script and runs it automatically, followed by make
with the
appropriate -j
(jobs) option. It automatically sets the --prefix
argument when running the configure script.
If the build uses something other and an Autoconf-like configure script,
such as CMake, then you can’t use the “build” command and must perform
the build yourself. For example, I must do this when building LLVM and
Clang.
Before using the “build” command, the package must first be unpacked and
patched if necessary. Then the package manager can take over to run the
build.
$ tar xzf name-version.tar.gz
$ cd name-version/
$ patch -p1 < ../0001.patch
$ patch -p1 < ../0002.patch
$ patch -p1 < ../0003.patch
$ cd ..
$ mkdir build
$ cd build/
$ qpkg -B ../name-version/
In this example I’m doing an out-of-source build by invoking the
configure script from a different directory. Did you know Autoconf
scripts support this? I didn’t know until recently! Unfortunately some
hand-written Autoconf-like scripts don’t, though this will
be immediately obvious.
Once qpkg
returns, the program will be fully built — or stuck on a
build error if you’re unlucky. If you need to pass custom configure
options, just tack them on the qpkg
command:
$ qpkg -B ../name-version/ --without-libxml2 --with-ncurses
Since the second and third steps — creating the build directory and
moving into it — is so common, there’s an optional switch for it: -d
.
This option’s argument is the build directory. qpkg
creates that
directory and runs the build inside it. In practice I just use “x” for
the build directory since it’s so quick to add “dx” to the command.
$ tar xzf name-version.tar.gz
$ qpkg -Bdx ../name-version/
With the software compiled, the next step is creating the package.
Create (-C
)
The “create” command creates the DESTDIR
(_destdir
in the working
directory) and runs the “install” Makefile target to fill it with files.
Continuing with the example above and its x/
build directory:
Where “name” is the name of the package, without any file name
extension. Like with “build”, extra arguments after the package name are
passed to make
in case there needs to be any additional tweaking.
When the “create” command finishes, there will be new package named
name.tgz
in the working directory. At this point the source and build
directories are no longer needed, assuming everything went fine.
$ rm -rf name-version/
$ rm -rf x/
This package is ready to install, though you may want to verify it
first.
Verify (-V
)
The “verify” command checks for collisions against installed packages.
It works like uninstallation, but rather than deleting files, it checks
if any of the files already exist. If they do, it means there’s a
conflict with an existing package. These file names are printed.
The most common conflict I’ve seen is in the info index (info/dir
)
file, which is safe to ignore since I don’t care about it.
If the package has already been installed, there will of course be tons
of conflicts. This is the easiest way to check if a package has been
installed.
Install (-I
)
The “install” command is just the dumb tar xzf
explained above. It
will clobber anything in its way without warning, which is why, if that
matters, “verify” should be used first.
When qpkg
returns, the package has been installed and is probably
ready to go. A lot of packages complain that you need to run libtool to
finalize an installation, but I’ve never had a problem skipping it. This
dumb unpacking generally works fine.
Uninstall (-U
)
Obviously the last command is “uninstall”. As explained above, this
needs the original package that was given to the “install” command.
Just as “install” is dumb, so is “uninstall,” blindly deleting anything
listed in the tarball. One thing I like about dumb tools is that there
are no surprises.
I typically suffix the package name with the version number to help keep
the packages organized. When upgrading to a new version of a piece of
software, I build the new package, which, thanks to the version suffix,
will have a distinct name. Then I uninstall the old package, and,
finally, install the new one in its place. So far I’ve been keeping the
old package around in case I still need it, though I could always
rebuild it in a pinch.
Package by accumulation
Building a GCC cross-compiler toolchain is a tricky case that doesn’t
fit so well with the build, create, and install process illustrated
above. It would be nice for the cross-compiler to be a single, big
package, but due to the way it’s built, it would need to be five or so
packages, a couple of which will conflict (one being a subset of
another):
- binutils
- C headers
- core GCC
- C runtime
- rest of GCC
Each step needs to be installed before the next step will work. (I don’t
even want to think about cross-compiling a cross-compiler.)
To deal with this, I added a “keep” (-k
) option that leaves the
DESTDIR
around after creating the package. To keep things tidy, the
intermediate packages exist and are installed, but the final, big
cross-compiler package accumulates into the DESTDIR
. The final
package at the end is actually the whole cross compiler in one package,
a superset of them all.
Complicated situations like these are where I can really understand the
value of Debian’s fakeroot tool.
My use case, and an alternative
The role filled by my package manager is actually pretty well suited for
pkgsrc, which is NetBSD’s ports system made available to other
unix-like systems. However, I just need something really lightweight
that gives me absolute control — even more than I get with pkgsrc — in
the dozen or so cases where I really need it.
All I need is a standard C toolchain in a unix-like environment (even a
really old one), the source tarballs for the software I need, my 110
line shell script package manager, and one to two cans of elbow grease.
From there I can bootstrap everything I might need without root access,
even in a disaster. If the software I need isn’t written in C, it
can ultimately get bootstrapped from some crusty old C compiler, which
might even involve building some newer C compilers in between. After a
certain point it’s C all the way down.
nullprogram.com/blog/2018/02/22/
This week I made a mistake that ultimately enlightened me about the
nature of function objects in Emacs Lisp. There are three kinds of
function objects, but they each behave very differently when evaluated
as objects.
But before we get to that, let’s talk about one of Emacs’
embarrassing, old missteps: eval-after-load
.
Taming an old dragon
One of the long-standing issues with Emacs is that loading Emacs Lisp
files (.el and .elc) is a slow process, even when those files have
been byte compiled. There are a number of dirty hacks in place to deal
with this issue, and the biggest and nastiest of them all is the
dumper, also known as unexec.
The Emacs you routinely use throughout the day is actually a previous
instance of Emacs that’s been resurrected from the dead. Your undead
Emacs was probably created months, if not years, earlier, back when it
was originally compiled. The first stage of compiling Emacs is to
compile a minimal C core called temacs
. The second stage is loading
a bunch of Emacs Lisp files, then dumping a memory image in an
unportable, platform-dependent way. On Linux, this actually requires
special hooks in glibc. The Emacs you know and love is this
dumped image loaded back into memory, continuing from where it left
off just after it was compiled. Regardless of your own feelings on the
matter, you have to admit this is a very lispy thing to do.
There are two notable costs to Emacs’ dumper:
-
The dumped image contains hard-coded memory addresses. This means
Emacs can’t be a Position Independent Executable (PIE). It can’t
take advantage of a security feature called Address Space Layout
Randomization (ASLR), which would increase the difficulty of
exploiting some classes of bugs. This might be
important to you if Emacs processes untrusted data, such as when it’s
used as a mail client, a web server or generally
parses data downloaded across the network.
-
It’s not possible to cross-compile Emacs since it can only be dumped
by running temacs
on its target platform. As an experiment I’ve
attempted to dump the Windows version of Emacs on Linux using
Wine, but was unsuccessful.
The good news is that there’s a portable dumper in the works
that makes this a lot less nasty. If you’re adventurous, you can
already disable dumping and run temacs
directly by setting
CANNOT_DUMP=yes
at compile time. Be warned, though, that a
non-dumped Emacs takes several seconds, or worse, to initialize
before it even begins loading your own configuration. It’s also
somewhat buggy since it seems nobody ever runs it this way
productively.
The other major way Emacs users have worked around slow loading is
aggressive use of lazy loading, generally via autoloads. The major
package interactive entry points are defined ahead of time as stub
functions. These stubs, when invoked, load the full package, which
overrides the stub definition, then finally the stub re-invokes the
new definition with the same arguments.
To further assist with lazy loading, an evaluated defvar
form will
not override an existing global variable binding. This means you can,
to a certain extent, configure a package before it’s loaded. The
package will not clobber any existing configuration when it loads.
This also explains the bizarre interfaces for the various hook
functions, like add-hook
and run-hooks
. These accept symbols — the
names of the variables — rather than values of those variables as
would normally be the case. The add-to-list
function does the same
thing. It’s all intended to cooperate with lazy loading, where the
variable may not have been defined yet.
eval-after-load
Sometimes this isn’t enough and you need some some configuration to
take place after the package has been loaded, but without forcing it
to load early. That is, you need to tell Emacs “evaluate this code
after this particular package loads.” That’s where eval-after-load
comes into play, except for its fatal flaw: it takes the word “eval”
completely literally.
The first argument to eval-after-load
is the name of a package. Fair
enough. The second argument is a form that will be passed to eval
after that package is loaded. Now hold on a minute. The general rule
of thumb is that if you’re calling eval
, you’re probably doing
something seriously wrong, and this function is no exception. This is
completely the wrong mechanism for the task.
The second argument should have been a function — either a (sharp
quoted) symbol or a function object. And then instead of eval
it
would be something more sensible, like funcall
. Perhaps this
improved version would be named call-after-load
or run-after-load
.
The big problem with passing an s-expression is that it will be left
uncompiled due to being quoted. I’ve talked before about the
importance of evaluating your lambdas. eval-after-load
not
only encourages badly written Emacs Lisp, it demands it.
;;; BAD!
(eval-after-load 'simple-httpd
'(push '("c" . "text/plain") httpd-mime-types))
This was all corrected in Emacs 25. If the second argument to
eval-after-load
is a function — the result of applying functionp
is
non-nil — then it uses funcall
. There’s also a new macro,
with-eval-after-load
, to package it all up nicely.
;;; Better (Emacs >= 25 only)
(eval-after-load 'simple-httpd
(lambda ()
(push '("c" . "text/plain") httpd-mime-types)))
;;; Best (Emacs >= 25 only)
(with-eval-after-load 'simple-httpd
(push '("c" . "text/plain") httpd-mime-types))
Though in both of these examples the compiler will likely warn about
httpd-mime-types
not being defined. That’s a problem for another
day.
A workaround
But what if you need to use Emacs 24, as was the situation that
sparked this article? What can we do with the bad version of
eval-after-load
? We could situate a lambda such that it’s evaluated,
but then smuggle the resulting function object into the form passed to
eval-after-load
, all using a backquote.
;;; Note: this is subtly broken
(eval-after-load 'simple-httpd
`(funcall
,(lambda ()
(push '("c" . "text/plain") httpd-mime-types)))
When everything is compiled, the backquoted form evalutes to this:
(funcall #[0 <bytecode> [httpd-mime-types ("c" . "text/plain")] 2])
Where the second value (#[...]
) is a byte-code object.
However, as the comment notes, this is subtly broken. A cleaner and
correct way to solve all this is with a named function. The damage
caused by eval-after-load
will have been (mostly) minimized.
(defun my-simple-httpd-hook ()
(push '("c" . "text/plain") httpd-mime-types))
(eval-after-load 'simple-httpd
'(funcall #'my-simple-httpd-hook))
But, let’s go back to the anonymous function solution. What was broken
about it? It all has to do with evaluating function objects.
Evaluating function objects
So what happens when we evaluate an expression like the one above with
eval
? Here’s what it looks like again.
First, eval
notices it’s been given a non-empty list, so it’s probably
a function call. The first argument is the name of the function to be
called (funcall
) and the remaining elements are its arguments. But
each of these elements must be evaluated first, and the result of that
evaluation becomes the arguments.
Any value that isn’t a list or a symbol is self-evaluating. That is,
it evaluates to its own value:
If the value is a symbol, it’s treated as a variable. If the value is a
list, it goes through the function call process I’m describing (or one
of a number of other special cases, such as macro expansion, lambda
expressions, and special forms).
So, conceptually eval
recurses on the function object #[...]
. A
function object is not a list or a symbol, so it’s self-evaluating. No
problem.
;; Byte-code objects are self-evaluating
(let ((x (byte-compile (lambda ()))))
(eq x (eval x)))
;; => t
What if this code wasn’t compiled? Rather than a byte-code object,
we’d have some other kind of function object for the interpreter.
Let’s examine the dynamic scope (shudder) case. Here, a lambda
appears to evaluate to itself, but appearances can be deceiving:
(eval (lambda ())
;; => (lambda ())
However, this is not self-evaluation. Lambda expressions are not
self-evaluating. It’s merely coincidence that the result of
evaluating a lambda expression looks like the original expression.
This is just how the Emacs Lisp interpreter is currently implemented
and, strictly speaking, it’s an implementation detail that just so
happens to be mostly compatible with byte-code objects being
self-evaluating. It would be a mistake to rely on this.
Instead, dynamic scope lambda expression evaluation is
idempotent. Applying eval
to the result will return
an equal
, but not identical (eq
), expression. In contrast, a
self-evaluating value is also idempotent under evaluation, but with
eq
results.
;; Not self-evaluating:
(let ((x '(lambda ())))
(eq x (eval x)))
;; => nil
;; Evaluation is idempotent:
(let ((x '(lambda ())))
(equal x (eval x)))
;; => t
(let ((x '(lambda ())))
(equal x (eval (eval x))))
;; => t
So, with dynamic scope, the subtly broken backquote example will still
work, but only by sheer luck. Under lexical scope, the situation isn’t
so lucky:
;;; -*- lexical-scope: t; -*-
(lambda ())
;; => (closure (t) nil)
These interpreted lambda functions are neither self-evaluating nor
idempotent. Passing t
as the second argument to eval
tells it to
use lexical scope, as shown below:
;; Not self-evaluating:
(let ((x '(lambda ())))
(eq x (eval x t)))
;; => nil
;; Not idempotent:
(let ((x '(lambda ())))
(equal x (eval x t)))
;; => nil
(let ((x '(lambda ())))
(equal x (eval (eval x t) t)))
;; error: (void-function closure)
I can imagine an implementation of Emacs Lisp where dynamic
scope lambda expressions are in the same boat, where they’re not even
idempotent. For example:
;;; -*- lexical-binding: nil; -*-
(lambda ())
;; => (totally-not-a-closure ())
Most Emacs Lisp would work just fine under this change, and only code
that makes some kind of logical mistake — where there’s nested
evaluation of lambda expressions — would break. This essentially
already happened when lots of code was quietly switched over to
lexical scope after Emacs 24. Lambda idempotency was lost and
well-written code didn’t notice.
There’s a temptation here for Emacs to define a closure
function or
special form that would allow interpreter closure objects to be either
self-evaluating or idempotent. This would be a mistake. It would only
serve as a hack that covers up logical mistakes that lead to nested
evaluation. Much better to catch those problems early.
Solving the problem with one character
So how do we fix the subtly broken example? With a strategically
placed quote right before the comma.
(eval-after-load 'simple-httpd
`(funcall
',(lambda ()
(push '("c" . "text/plain") httpd-mime-types)))
So the form passed to eval-after-load
becomes:
;; Compiled:
(funcall (quote #[...]))
;; Dynamic scope:
(funcall (quote (lambda () ...)))
;; Lexical scope:
(funcall (quote (closure (t) () ...)))
The quote prevents eval
from evaluating the function object, which
would be either needless or harmful. There’s also an argument to be
made that this is a perfect situation for a sharp-quote (#'
), which
exists to quote functions.
nullprogram.com/blog/2018/02/15/
I’ve put together two online, interactive, demonstrations of chaotic
motion. One is 2D and the other is 3D, but both are rendered
using WebGL — which, for me, is the most interesting part.
Both are governed by ordinary differential equations. Both are
integrated using the Runge–Kutta method, specifically RK4.
Far more knowledgeable people have already written introductions for
chaos theory, so here’s just a quick summary. A chaotic system is
deterministic but highly sensitive to initial conditions. Tweaking a
single bit of the starting state of either of my demos will quickly
lead to two arbitrarily different results. Both demonstrations have
features that aim to show this in action.
This ain’t my first chaotic system rodeo. About eight years ago I made
water wheel Java applet, and that was based on some Matlab code I
collaborated on some eleven years ago. I really hope you’re not equipped
to run a crusty old Java applet in 2018, though. (Update: now
upgraded to HTML5 Canvas.)
If you want to find either of these demos again in the future, you
don’t need to find this article first. They’re both listed in my
Showcase page, linked from the header of this site.
Double pendulum
First up is the classic double pendulum. This one’s more intuitive
than my other demo since it’s modeling a physical system you could
actually build and observe in the real world.

Source: https://github.com/skeeto/double-pendulum
I lifted the differential equations straight from the Wikipedia article
(derivative()
in my code). Same for the Runge–Kutta method (rk4()
in
my code). It’s all pretty straightforward. RK4 may not have been the
best choice for this system since it seems to bleed off energy over
time. If you let my demo run over night, by morning there will obviously
be a lot less activity.
I’m not a fan of buttons and other fancy GUI widgets — neither
designing them nor using them myself — prefering more cryptic, but
easy-to-use keyboard-driven interfaces. (Hey, it works well for
mpv and MPlayer.) I haven’t bothered with a mobile
interface, so sorry if you’re reading on your phone. You’ll just have
to enjoy watching a single pendulum.
Here are the controls:
- a: add a new random pendulum
- c: imperfectly clone an existing pendulum
- d: delete the most recently added pendulum
- m: toggle between WebGL and Canvas rendering
- SPACE: pause the simulation (toggle)
To witness chaos theory in action:
- Start with a single pendulum (the default).
- Pause the simulation (SPACE).
- Make a dozen or so clones (press c for awhile).
- Unpause.
At first it will appear as single pendulum, but they’re actually all
stacked up, each starting from slightly randomized initial conditions.
Within a minute you’ll witness the pendulums diverge, and after a minute
they’ll all be completely different. It’s pretty to watch them come
apart at first.
It might appear that the m key doesn’t actually do
anything. That’s because the HTML5 Canvas rendering — which is what I
actually implemented first — is really close to the WebGL rendering.
I’m really proud of this. There are just three noticeable differences.
First, there’s a rounded line cap in the Canvas rendering where the
pendulum is “attached.” Second, the tail line segments aren’t properly
connected in the Canvas rendering. The segments are stroked separately
in order to get that gradient effect along its path. Third, it’s a lot
slower, particularly when there are many pendulums to render.

In WebGL the two “masses” are rendered using that handy old circle
rasterization technique on a quad. Either a triangle fan
or pre-rendering the circle as a texture would probably have been a
better choices. The two bars are the same quad buffers, just squeezed
and rotated into place. Both were really simple to create. It’s the
tail that was tricky to render.
When I wrote the original Canvas renderer, I set the super convenient
lineWidth
property to get a nice, thick tail. In my first cut at
rendering the tail I used GL_LINE_STRIP
to draw a line primitive.
The problem with the line primitive is that an OpenGL implementation
is only required to support single pixel wide lines. If I wanted
wider, I’d have to generate the geometry myself. So I did.
Like before, I wasn’t about to dirty my hands manipulating a
graphite-filled wooden stick on a piece of paper to solve this
problem. No, I lifted the math from something I found on the internet
again. In this case it was a forum post by paul.houx, which
provides a few vector equations to compute a triangle strip from a
line strip. My own modification was to add a miter limit, which keeps
sharp turns under control. You can find my implementation in
polyline()
in my code. Here’s a close-up with the skeleton rendered
on top in black:

For the first time I’m also using ECMAScript’s new template
literals to store the shaders inside the JavaScript source. These
string literals can contain newlines, but, even cooler, I it does
string interpolation, meaning I can embed JavaScript variables
directly into the shader code:
let massRadius = 0.12;
let vertexShader = `
attribute vec2 a_point;
uniform vec2 u_center;
varying vec2 v_point;
void main() {
v_point = a_point;
gl_Position = vec4(a_point * ${massRadius} + u_center, 0, 1);
}`;
Allocation avoidance
If you’ve looked at my code you might have noticed something curious.
I’m using a lot of destructuring assignments, which is another
relatively new addition to ECMAScript. This was part of a little
experiment.
function normalize(v0, v1) {
let d = Math.sqrt(v0 * v0 + v1 * v1);
return [v0 / d, v1 / d];
}
/* ... */
let [nx, ny] = normalize(-ly, lx);
One of my goals for this project was zero heap allocations in the
main WebGL rendering loop. There are no garbage collector hiccups
if there’s no garbage to collect. This sort of thing is trivial
in a language with manual memory management, such as C and C++. Just
having value semantics for aggregates would be sufficient.
But with JavaScript I don’t get to choose how my objects are allocated.
I either have to pre-allocate everything, including space for all the
intermediate values (e.g. an object pool). This would be clunky and
unconventional. Or I can structure and access my allocations in such a
way that the JIT compiler can eliminate them (via escape analysis,
scalar replacement, etc.).
In this case, I’m trusting that JavaScript implementations will
flatten these destructuring assignments so that the intermediate array
never actually exists. It’s like pretending the array has value
semantics. This seems to work as I expect with V8, but not so well
with SpiderMonkey (yet?), at least in Firefox 52 ESR.
Single precision
I briefly considered using Math.fround()
to convince
JavaScript to compute all the tail geometry in single precision. The
double pendulum system would remain double precision, but the geometry
doesn’t need all that precision. It’s all rounded to single precision
going out to the GPU anyway.
Normally when pulling values from a Float32Array
, they’re cast to
double precision — JavaScript’s only numeric type — and all operations
are performed in double precision, even if the result is stored back
in a Float32Array
. This is because the JIT compiler is required to
correctly perform all the intermediate rounding. To relax this
requirement, surround each operation with a call to
Math.fround()
. Since the result of doing each operation in
double precision with this rounding step in between is equivalent to
doing each operation in single precision, the JIT compiler can choose
to do the latter.
let x = new Float32Array(n);
let y = new Float32Array(n);
let d = new Float32Array(n);
// ...
for (let i = 0; i < n; i++) {
let xprod = Math.fround(x[i] * x[i]);
let yprod = Math.fround(y[i] * y[i]);
d[i] = Math.sqrt(Math.fround(xprod + yprod));
}
I ultimately decided not to bother with this since it would
significantly obscures my code for what is probably a minuscule
performance gain (in this case). It’s also really difficult to tell if
I did it all correctly. So I figure this is better suited for
compilers that target JavaScript rather than something to do by hand.
Lorenz system
The other demo is a Lorenz system with its famous butterfly
pattern. I actually wrote this one a year and a half ago but never got
around to writing about it. You can tell it’s older because I’m still
using var
.

Source: https://github.com/skeeto/lorenz-webgl
Like before, the equations came straight from the Wikipedia article
(Lorenz.lorenz()
in my code). They math is a lot simpler this time,
too.
This one’s a bit more user friendly with a side menu displaying all
your options. The keys are basically the same. This was completely by
accident, I swear. Here are the important ones:
- a: add a new random solution
- c: clone a solution with a perturbation
- C: remove all solutions
- SPACE: toggle pause/unpause
- You can click, drag, and toss it to examine it in 3D
Witnessing chaos theory in action is the same process as before: clear
it down to a single solution (C then a), then add
a bunch of randomized clones (c).
There is no Canvas renderer for this one. It’s pure WebGL. The tails are
drawn using GL_LINE_STRIP
, but in this case it works fine that they’re
a single pixel wide. If heads are turned on, those are just GL_POINT
.
The geometry is threadbare for this one.
There is one notable feature: The tails are stored exclusively in
GPU memory. Only the “head” is stored CPU-side. After it computes
the next step, it updates a single spot of the tail with
glBufferSubData()
, and the VBO is actually a circular buffer. OpenGL
doesn’t directly support rendering from circular buffers, but it
does have element arrays. An element array is an additional buffer
of indices that tells OpenGL what order to use the elements in the
other buffers.
Naively would mean for a tail of 4 segments, I need 4 different
element arrays, one for each possible rotation:
array 0: 0 1 2 3
array 1: 1 2 3 0
array 2: 2 3 0 1
array 3: 3 0 1 2
With the knowledge that element arrays can start at an offset, and
with a little cleverness, you might notice these can all overlap in a
single, 7-element array:
Array 0 is at offset 0, array 1 is at offset 1, array 2 is at offset 2,
and array 3 is at offset 3. The tails in the Lorenz system are drawn
using drawElements()
with exactly this sort of array.
Like before, I was very careful to produce zero heap allocations in the
main loop. The FPS counter generates some garbage in the DOM due to
reflow, but this goes away if you hide the help menu (?). This
was long enough ago that destructuring assignment wasn’t available, but
Lorenz system and rendering it were so simple that using pre-allocated
objects worked fine.
Beyond just the programming, I’ve gotten hours of entertainment
playing with each of these systems. This was also the first time I’ve
used WebGL in over a year, and this project was a reminder of just how
working with it is so pleasurable. The specification is
superbly written and serves perfectly as its own reference.
nullprogram.com/blog/2018/02/14/
Russian translation by ClipArtMag.
Ukrainian translation by Open Source Initiative.
So your Emacs package has grown beyond a dozen or so lines of code, and
the data it manages is now structured and heterogeneous. Informal plain
old lists, the bread and butter of any lisp, are not longer cutting it.
You really need to cleanly abstract this structure, both for your own
organizational sake any for anyone reading your code.
With informal lists as structures, you might regularly ask questions
like, “Was the ‘name’ slot stored in the third list element, or was
it the fourth element?” A plist or alist helps with this problem, but
those are better suited for informal, externally-supplied data, not
for internal structures with fixed slots. Occasionally someone
suggests using hash tables as structures, but Emacs Lisp’s hash tables
are much too heavy for this. Hash tables are more appropriate when
keys themselves are data.
Defining a data structure from scratch
Imagine a refrigerator package that manages a collection of food in a
refrigerator. A food item could be structured as a plain old list,
with slots at specific positions.
(defun fridge-item-create (name expiry weight)
(list name expiry weight))
A function that computes the mean weight of a list of food items might
look like this:
(defun fridge-mean-weight (items)
(if (null items)
0.0
(let ((sum 0.0)
(count 0))
(dolist (item items (/ sum count))
(setf count (1+ count)
sum (+ sum (nth 2 item)))))))
Note the use of (nth 2 item)
at the end, used to get the item’s
weight. That magic number 2 is easy to mess up. Even worse, if lots of
code accesses “weight” this way, then future extensions will be
inhibited. Defining some accessor functions solves this problem.
(defsubst fridge-item-name (item)
(nth 0 item))
(defsubst fridge-item-expiry (item)
(nth 1 item))
(defsubst fridge-item-weight (item)
(nth 2 item))
The defsubst
defines an inline function, so there’s effectively no
additional run-time costs for these accessors compared to a bare
nth
. Since these only cover getting slots, we should also define
some setters using the built-in gv (generalized variable) package.
(require 'gv)
(gv-define-setter fridge-item-name (value item)
`(setf (nth 0 ,item) ,value))
(gv-define-setter fridge-item-expiry (value item)
`(setf (nth 1 ,item) ,value))
(gv-define-setter fridge-item-weight (value item)
`(setf (nth 2 ,item) ,value))
This makes each slot setf-able. Generalized variables are great for
simplifying APIs, since otherwise there would need to be an equal
number of setter functions (fridge-item-set-name
, etc.). With
generalized variables, both are at the same entrypoint:
(setf (fridge-item-name item) "Eggs")
There are still two more significant improvements.
-
As far as Emacs Lisp is concerned, this isn’t a real type. The
type-ness of it is just a fiction created by the conventions of the
package. It would be easy to make the mistake of passing an
arbitrary list to these fridge-item
functions, and the mistake
wouldn’t be caught so long as that list has at least three items.
An common solution is to add a type tag: a symbol at the
beginning of the structure that identifies it.
-
It’s still a linked list, and nth
has to walk the list (i.e.
O(n)
) to retrieve items. It would be much more efficient to use a
vector, turning this into an efficient O(1)
operation.
Addressing both of these at once:
(defun fridge-item-create (name expiry weight)
(vector 'fridge-item name expiry weight))
(defsubst fridge-item-p (object)
(and (vectorp object)
(= (length object) 4)
(eq 'fridge-item (aref object 0))))
(defsubst fridge-item-name (item)
(unless (fridge-item-p item)
(signal 'wrong-type-argument (list 'fridge-item item)))
(aref item 1))
(defsubst fridge-item-name--set (item value)
(unless (fridge-item-p item)
(signal 'wrong-type-argument (list 'fridge-item item)))
(setf (aref item 1) value))
(gv-define-setter fridge-item-name (value item)
`(fridge-item-name--set ,item ,value))
;; And so on for expiry and weight...
As long as fridge-mean-weight
uses the fridge-item-weight
accessor, it continues to work unmodified across all these changes.
But, whew, that’s quite a lot of boilerplate to write and maintain
for each data structure in our package! Boilerplate code generation is
a perfect candidate for a macro definition. Luckily for us, Emacs
already defines a macro to generate all this code: cl-defstruct
.
(require 'cl)
(cl-defstruct fridge-item
name expiry weight)
In Emacs 25 and earlier, this innocent looking definition expands into
essentially all the above code. The code it generates is expressed in
the most optimal form for its version of Emacs, and it
exploits many of the available optimizations by using function
declarations such as side-effect-free
and error-free
. It’s
configurable, too, allowing for the exclusion of a type tag (:named
)
— discarding all the type checks — or using a list rather than a
vector as the underlying structure (:type
). As a crude form of
structural inheritance, it even allows for directly embedding other
structures (:include
).
Two pitfalls
There a couple pitfalls, though. First, for historical reasons, the
macro will define two namespace-unfriendly functions: make-NAME
and
copy-NAME
. I always override these, preferring the -create
convention for the constructor, and tossing the copier since it’s
either useless or, worse, semantically wrong.
(cl-defstruct (fridge-item (:constructor fridge-item-create)
(:copier nil))
name expiry weight)
If the constructor needs to be more sophisticated than just setting
slots, it’s common to define a “private” constructor (double dash in
the name) and wrap it with a “public” constructor that has some
behavior.
(cl-defstruct (fridge-item (:constructor fridge-item--create)
(:copier nil))
name expiry weight entry-time)
(cl-defun fridge-item-create (&rest args)
(apply #'fridge-item--create :entry-time (float-time) args))
The other pitfall is related to printing. In Emacs 25 and earlier,
types defined by cl-defstruct
are still only types by convention.
They’re really just vectors as far as Emacs Lisp is concerned. One
benefit from this is that printing and reading these
structures is “free” because vectors are printable. It’s trivial to
serialize cl-defstruct
structures out to a file. This is exactly
how the Elfeed database works.
The pitfall is that once a structure has been serialized, there’s no
more changing the cl-defstruct
definition. It’s now a file format
definition, so the slots are locked in place. Forever.
Emacs 26 throws a wrench in all this, though it’s worth it in the long
run. There’s a new primitive type in Emacs 26 with its own reader
syntax: records. This is similar to hash tables becoming first class
in the reader in Emacs 23.2. In Emacs 26, cl-defstruct
uses
records instead of vectors.
;; Emacs 25:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => [cl-struct-fridge-item "Eggs" nil 11.1]
;; Emacs 26:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => #s(fridge-item "Eggs" nil 11.1)
So far slots are still accessed using aref
, and all the type
checking still happens in Emacs Lisp. The only practical change is the
record
function is used in place of the vector
function when
allocating a structure. But it does pave the way for more interesting
things in the future.
The major short-term downside is that this breaks printed compatibility
across the Emacs 25/26 boundary. The cl-old-struct-compat-mode
function can be used for some degree of backwards, but not forwards,
compatibility. Emacs 26 can read and use some structures printed by
Emacs 25 and earlier, but the reverse will never be true. This issue
initially tripped up Emacs’ built-in packages, and when Emacs 26
is released we’ll see more of these issues arise in external packages.
Dynamic dispatch
Prior to Emacs 25, the major built-in package for dynamic dispatch —
functions that specialize on the run-time type of their arguments — was
EIEIO, though it only supported single dispatch (specializing on a
single argument). EIEIO brought much of the Common Lisp Object System
(CLOS) to Emacs Lisp, including classes and methods.
Emacs 25 introduced a more sophisticated dynamic dispatch package
called cl-generic. It focuses only on dynamic dispatch and supports
multiple dispatch, completely replacing the dynamic dispatch portion
of EIEIO. Since cl-defstruct
does inheritance and cl-generic does
dynamic dispatch, there’s not really much left for EIEIO — besides bad
ideas like multiple inheritance and method combination.
Without either of these packages, the most direct way to build single
dispatch on top of cl-defstruct
would be to shove a function in one
of the slots. Then the “method” is just a wrapper that call this
function.
;; Base "class"
(cl-defstruct greeter
greeting)
(defun greet (thing)
(funcall (greeter-greeting thing) thing))
;; Cow "class"
(cl-defstruct (cow (:include greeter)
(:constructor cow--create)))
(defun cow-create ()
(cow--create :greeting (lambda (_) "Moo!")))
;; Bird "class"
(cl-defstruct (bird (:include greeter)
(:constructor bird--create)))
(defun bird-create ()
(bird--create :greeting (lambda (_) "Chirp!")))
;; Usage:
(greet (cow-create))
;; => "Moo!"
(greet (bird-create))
;; => "Chirp!"
Since cl-generic is aware of the types created by cl-defstruct
,
functions can specialize on them as if they were native types. It’s a
lot simpler to let cl-generic do all the hard work. The people reading
your code will appreciate it, too:
(require 'cl-generic)
(cl-defgeneric greet (greeter))
(cl-defstruct cow)
(cl-defmethod greet ((_ cow))
"Moo!")
(cl-defstruct bird)
(cl-defmethod greet ((_ bird))
"Chirp!")
(greet (make-cow))
;; => "Moo!"
(greet (make-bird))
;; => "Chirp!"
The majority of the time a simple cl-defstruct
will fulfill your
needs, keeping in mind the gotcha with the constructor and copier
names. Its use should feel almost as natural as defining functions.
nullprogram.com/blog/2018/02/07/
This article is an expanded email I wrote in response to a question
from Frank Muller. He asked how I arrived at my solution to a
branchless UTF-8 decoder:
I mean, when you started, I’m pretty the initial solution was using
branches, right? Then, you’ve applied some techniques to eliminate
them.
A bottom-up approach that begins with branches and then proceeds to
eliminate them one at a time sounds like a plausible story. However,
this story is the inverse of how it actually played out. It began when I
noticed a branchless decoder could probably be done, then I put together
the pieces one at a time without introducing any branches. But what
sparked that initial idea?
The two prior posts reveal my train of thought at the time: a look at
the Blowfish cipher and a 64-bit PRNG shootout. My
layman’s study of Blowfish was actually part of an examination of a
number of different block ciphers. For example, I also read the NSA’s
Speck and Simon paper and then implemented the 128/128 variant
of Speck — a 128-bit key and 128-bit block. I didn’t take the
time to write an article about it, but note how the entire cipher —
key schedule, encryption, and decryption — is just 40 lines of code:
struct speck {
uint64_t k[32];
};
void
speck_init(struct speck *ctx, uint64_t x, uint64_t y)
{
ctx->k[0] = y;
for (uint64_t i = 0; i < 31; i++) {
x = (x >> 8) | (x << 56);
x += y;
x ^= i;
y = (y << 3) | (y >> 61);
y ^= x;
ctx->k[i + 1] = y;
}
}
void
speck_encrypt(const struct speck *ctx, uint64_t *x, uint64_t *y)
{
for (int i = 0; i < 32; i++) {
*x = (*x >> 8) | (*x << 56);
*x += *y;
*x ^= ctx->k[i];
*y = (*y << 3) | (*y >> 61);
*y ^= *x;
}
}
static void
speck_decrypt(const struct speck *ctx, uint64_t *x, uint64_t *y)
{
for (int i = 0; i < 32; i++) {
*y ^= *x;
*y = (*y >> 3) | (*y << 61);
*x ^= ctx->k[31 - i];
*x -= *y;
*x = (*x << 8) | (*x >> 56);
}
}
Isn’t that just beautiful? It’s so tiny and fast. Other than the
not-very-arbitrary selection of 32 rounds, and the use of 3-bit and
8-bit rotations, there are no magic values. One could fairly
reasonably commit this cipher to memory if necessary, similar to the
late RC4. Speck is probably my favorite block cipher right now,
except that I couldn’t figure out the key schedule for any of the
other key/block size variants.
Another cipher I studied, though in less depth, was RC5 (1994),
a block cipher by (obviously) Ron Rivest. The most novel part of RC5
is its use of data dependent rotations. This was a very deliberate
decision, and the paper makes this clear:
RC5 should highlight the use of data-dependent rotations, and
encourage the assessment of the cryptographic strength data-dependent
rotations can provide.
What’s a data-dependent rotation. In the Speck cipher shown above,
notice how the right-hand side of all the rotation operations is a
constant (3, 8, 56, and 61). Suppose that these operands were not
constant, instead they were based on some part of the value of the
block:
int r = *y & 0x0f;
*x = (*x >> r) | (*x << (64 - r));
Such “random” rotations “frustrate differential cryptanalysis” according
to the paper, increasing the strength of the cipher.
Another algorithm that uses data-dependent shift is the PCG family of
PRNGs. Honestly, the data-dependent “permutation” shift is the
defining characteristic of PCG. As a reminder, here’s the simplified PCG
from my shootout:
uint32_t
spcg32(uint64_t s[1])
{
uint64_t m = 0x9b60933458e17d7d;
uint64_t a = 0xd737232eeccdf7ed;
*s = *s * m + a;
int shift = 29 - (*s >> 61);
return *s >> shift;
}
Notice how the final shift depends on the high order bits of the PRNG
state. (This one weird trick from Melissa O’Neil will significantly
improve your PRNG. Xorshift experts hate her.)
I think this raises a really interesting question: Why did it take until
2014 for someone to apply a data-dependent shift to a PRNG? Similarly,
why are data-dependent rotations not used in many ciphers?
My own theory is that this is because many older instruction set
architectures can’t perform data-dependent shift operations efficiently.
Many instruction sets only have a fixed shift (e.g. 1-bit), or can
only shift by an immediate (e.g. a constant). In these cases, a
data-dependent shift would require a loop. These loops would be a ripe
source of side channel attacks in ciphers due to the difficultly of
making them operate in constant time. It would also be relatively slow
for video game PRNGs, which often needed to run on constrained
hardware with limited instruction sets. For example, the 6502 (Atari,
Apple II, NES, Commodore 64) and the Z80 (too many to list) can only
shift/rotate one bit at a time.
Even on an architecture with an instruction for data-dependent shifts,
such as the x86, those shifts will be slower than constant shifts, at
least in part due to the additional data dependency.
It turns out there are also some patent issues (ex. 1, 2).
Fortunately most of these patents have now expired, and one in
particular is set to expire this June. I still like my theory better.
To branchless decoding
So I was thinking about data-dependent shifts, and I had also noticed I
could trivially check the length of a UTF-8 code point using a small
lookup table — the first step in my decoder. What if I combined these: a
data-dependent shift based on a table lookup. This would become the last
step in my decoder. The idea for a branchless UTF-8 decoder was
essentially borne out of connecting these two thoughts, and then filling
in the middle.
nullprogram.com/blog/2018/01/17/
Update: This article was featured on BSD Now 233 (starting
at 21:38).
For some time Elfeed was experiencing a strange, spurious
failure. Every so often users were seeing an error (spoiler
warning) when updating feeds: “error in process sentinel: Search
failed.” If you use Elfeed, you might have even seen this yourself.
From the surface it appeared that curl, tasked with the
responsibility for downloading feed data, was producing
incomplete output despite reporting a successful run. Since the run
was successful, Elfeed assumed certain data was in curl’s output
buffer, but, since it wasn’t, it failed hard.
Unfortunately this issue was not reproducible. Manually running curl
outside of Emacs never revealed any issues. Asking Elfeed to retry
fetching the feeds would work fine. The issue would only randomly rear
its head when Elfeed was fetching many feeds in parallel, under
stress. By the time the error was discovered, the curl process had
exited and vital debugging information was lost. Considering that
this was likely to be a bug in Emacs itself, there really wasn’t a
reliable way to capture the necessary debugging information from
within Emacs Lisp. And, indeed, this later proved to be the case.
A quick-and-dirty work around is to use condition-case
to catch and
swallow the error. When the bizarre issue shows up, rather than fail
badly in front of the user, Elfeed could attempt to swallow the error
— assuming it can be reliably detected — and treat the fetch as simply
a failure. That didn’t sit comfortably with me. Elfeed had done its
due diligence checking for errors already. Someone was lying to
Elfeed, and I intended to catch them with their pants on fire.
Someday.
I’d just need to witness the bug on one of my own machines. Elfeed is
part of my daily routine, so surely I’d have to experience this issue
myself someday. My plan was, should that day come, to run a modified
Elfeed, instrumented to capture extra data. I would have also routinely
run Emacs under GDB so that I could inspect the failure more deeply.
For now I just had to wait to hunt that zebra.
Bryan Cantrill, DTrace, and FreeBSD
Over the holidays I re-discovered Bryan Cantrill, a systems
software engineer who worked for Sun between 1996 and 2010, and is most
well known for DTrace. My first exposure to him was in a BSD
Now interview in 2015. I had re-watched that interview and decided
there was a lot more I had to learn from him. He’s become a personal
hero to me. So I scoured the internet for more of his writing and
talks. Besides what I’ve already linked in this article, here
are a couple more great presentations:
You can also find some of his writing scattered around the DTrace
blog.
Some interesting operating system technology came out of Sun during
its final 15 or so years — most notably DTrace and ZFS — and Bryan
speaks about it passionately. Almost as a matter of luck, most of it
survived the Oracle acquisition thanks to Sun releasing it as open
source in just the nick of time. Otherwise it would have been lost
forever. The scattered ex-Sun employees, still passionate about their
prior work at Sun, along with some of their old customers have since
picked up the pieces and kept going as a community under the name
illumos. It’s like an open source flotilla.
Naturally I wanted to get my hands on this stuff to try it out for
myself. Is it really as good as they say? Normally I stick to Linux,
but it (generally) doesn’t have these Sun technologies. The main
reason is license incompatibility. Sun released its code under the
CDDL, which is incompatible with the GPL. Ubuntu does
infamously include ZFS, but other distributions are
unwilling to take that risk. Porting DTrace is a serious undertaking
since it’s got its fingers throughout the kernel, which also makes the
licensing issues even more complicated.
(Update Feburary 2018: DTrace has been released under the
GPLv2, allowing it to be legally integrated with Linux.)
Linux has a reputation for Not Invented Here (NIH) syndrome, and these
licensing issues certainly contribute to that. Rather than adopt ZFS
and DTrace, they’ve been reinvented from scratch: btrfs instead of
ZFS, and a slew of partial options instead of DTrace.
Normally I’m most interested in system call tracing, and my go to is
strace, though it certainly has its limitations — including
this situation of debugging curl under Emacs. Another famous example
of NIH is Linux’s epoll(2)
, which is a broken
version of BSD kqueue(2)
.
So, if I want to try these for myself, I’ll need to install a
different operating system. I’ve dabbled with OmniOS, an OS
built on illumos, in virtual machines, using it as an alien
environment to test some of my software (e.g. enchive).
OmniOS has a philosophy called Keep Your Software To Yourself
(KYSTY), which is really just code for “we don’t do packaging.”
Honestly, you can’t blame them since they’re a tiny community.
The best solution to this is probably pkgsrc, which is
essentially a universal packaging system. Otherwise you’re on your
own.
There’s also openindiana, which is a more friendly
desktop-oriented illumos distribution. Still, the short of it is that
you’re very much on your own when things don’t work. The situation is
like running Linux a couple decades ago, when it was still difficult
to do.
If you’re interested in trying DTrace, the easiest option these days is
probably FreeBSD. It’s got a big, active community, thorough
documentation, and a huge selection of packages. Its license (the BSD
license, duh) is compatible with the CDDL, so both ZFS and DTrace have
been ported to FreeBSD.
What is DTrace?
I’ve done all this talking but haven’t yet described what DTrace
really is. I won’t pretend to write my own tutorial, but I’ll
provide enough information to follow along. DTrace is a tracing
framework for debugging production systems in real time, both for
the kernel and for applications. The “production systems” part means
it’s stable and safe — using DTrace won’t put your system at risk of
crashing or damaging data. The “real time” part means it has little
impact on performance. You can use DTrace on live, active systems with
little impact. Both of these core design principles are vital for
troubleshooting those really tricky bugs that only show up in
production.
There are DTrace probes scattered all throughout the system: on
system calls, scheduler events, networking events, process events,
signals, virtual memory events, etc. Using a specialized language
called D (unrelated to the general purpose programming language D),
you can dynamically add behavior at these instrumentation points.
Generally the behavior is to capture information, but it can also
manipulate the event being traced.
Each probe is fully identified by a 4-tuple delimited by colons:
provider, module, function, and probe name. An empty element denotes a
sort of wildcard. For example, syscall::open:entry
is a probe at the
beginning (i.e. “entry”) of open(2)
. syscall:::entry
matches all
system call entry probes.
Unlike strace on Linux which monitors a specific process, DTrace
applies to the entire system when active. To run curl under strace
from Emacs, I’d have to modify Emacs’ behavior to do so. With DTrace I
can instrument every curl process without making a single change to
Emacs, and with negligible impact to Emacs. That’s a big deal.
So, when it comes to this Elfeed issue, FreeBSD is much better poised
for debugging the problem. All I have to do is catch it in the act.
However, it’s been months since that bug report and I’m not really
making this connection yet. I’m just hoping I eventually find an
interesting problem where I can apply DTrace.
FreeBSD on a Raspberry Pi 2
So I’ve settled in FreeBSD as the playground for these technologies, I
just have to decide where. I could always run it in a virtual machine,
but it’s always more interesting to try things out on real hardware.
FreeBSD supports the Raspberry Pi 2 as a Tier 2 system, and
I had a Raspberry Pi 2 sitting around collecting dust, so I put it to
use.
I wrote the image to an SD card, and for a few days I stretched my
legs on this new system. I cloned a couple dozen of my own git
repositories, ran the builds and the tests, and just got a feel for
things. I tried out the ports system for the first time, mainly to
discover that the low-powered Raspberry Pi 2 takes days to build some
of the packages I want to try.
I mostly program in Vim these days, so it’s some days before I
even set up Emacs. Eventually I do build Emacs, clone my
configuration, fire it up, and give Elfeed a spin.
And that’s when the “search failed” bug strikes! Not just once, but
dozens of times. Perfect! This low-powered platform is the jackpot for
this particular bug, triggering it left and right. Given that I’ve got
DTrace at my disposal, it’s the perfect place to debug this.
Something is lying to Elfeed and DTrace will play the judge.
Before I dive in I see three possibilities:
- curl is reporting success but truncating its output.
- Emacs is quietly truncating curl’s output.
- Emacs is misinterpreting curl’s exit status.
With Dtrace I can observe what every curl process writes to Emacs, and
I can also double check curl’s exit status. I come up with the
following (newbie) DTrace script:
syscall::write:entry
/execname == "curl"/
{
printf("%d WRITE %d \"%s\"\n",
pid, arg2, stringof(copyin(arg1, arg2)));
}
syscall::exit:entry
/execname == "curl"/
{
printf("%d EXIT %d\n", pid, arg0);
}
The /execname == "curl"/
is a predicate that (obviously) causes the
behavior to only fire for curl processes. The first probe has DTrace
print a line for every write(2)
from curl. arg0
, arg1
, and
arg2
correspond to the arguments of write(2)
: fd, buf, count. It
logs the process ID (pid) of the write, the length of the write, and
the actual contents written. Remember that these curl processes are
run in parallel by Emacs, so the pid allows me to associate the
separate writes and the exit status.
The second probe prints the pid and the exit status (the first argument
to exit(2)
).
I also want to compare this to exactly what is delivered to Elfeed when
curl exits, so I modify the process sentinel — the callback
that handles a subprocess exiting — to call write-file
before any
action is taken. I can compare these buffer dumps to the logs produced
by DTrace.
There are two important findings.
First, when the “search failed” bug occurs, the buffer was completely
empty (95% of the time) or truncated at the end of the HTTP headers
(5% of the time), right at the blank line. DTrace indicates that curl
did its job to the full, so it’s Emacs who’s the liar. It’s not
delivering all of curl’s data to Elfeed. That’s pretty annoying.
Second, curl was line-buffered. Each line was a separate,
independent write(2)
. I was certainly not expecting this. Normally
the C library only does line buffering when the output is a terminal.
That’s because it’s guessing a user may be watching, expecting the
output to arrive a line at a time.
Here’s a sample of what it looked like in the log:
88188 WRITE 32 "Server: Apache/2.4.18 (Ubuntu)
"
88188 WRITE 46 "Location: https://blog.plover.com/index.atom
"
88188 WRITE 21 "Content-Length: 299
"
88188 WRITE 45 "Content-Type: text/html; charset=iso-8859-1
"
88188 WRITE 2 "
"
Why would curl think Emacs is a terminal?
Oh. That’s right. This is the same problem I ran into four years
ago when writing EmacSQL. By default Emacs connects to
subprocesses through a psuedo-terminal (pty). I called this a mistake
in Emacs back then, and I still stand by that claim. The pty causes
weird, annoying problems for little benefit:
- Interpreting control characters. Hope you weren’t transferring binary
data!
- Subprocesses will generally get line buffered. This makes them
slower, though in some situations it might be desirable.
- Stdout and stderr get mixed together. (Optional since Emacs 25.)
- New! There’s a bug somewhere in Emacs that causes truncation when
ptys are used heavily in parallel.
Just from eyeballing the DTrace log I knew what to do: dump the pty
and switch to a pipe. This is controlled with the
process-connection-type
variable, and fixing it is a
one-liner.
Not only did this completely resolve the truncation issue, Elfeed is
noticeably faster at fetching feeds on all machines. It’s no longer
receiving mountains of XML one line at a time, like sucking pudding
through a straw. It’s now quite zippy even on my Raspberry Pi 2, which
had never been the case before (without the “search failed” bug).
Even if you were never affected by this bug, you will benefit from the
fix.
I haven’t officially reported this as an Emacs bug yet because
reproducibility is still an issue. It needs something better than
“fire off a bunch of HTTP requests across the internet in parallel
from a Raspberry Pi.”
The fix reminds me of that old boilermaker story about
charging a lot of money just to swing a hammer. Once the problem
arose, DTrace quickly helped to identify the place to hit Emacs with
the hammer.
Finally, a big thanks to alphapapa for originally taking the time to
report this bug months ago.
nullprogram.com/blog/2017/12/14/
There was recently some interesting discussion about correctly
using backquotes to express a mixture of data and code. Since lambda
expressions seem to evaluate to themselves, what’s the difference?
For example, an association list of operations:
'((add . (lambda (a b) (+ a b)))
(sub . (lambda (a b) (- a b)))
(mul . (lambda (a b) (* a b)))
(div . (lambda (a b) (/ a b))))
It looks like it would work, and indeed it does work in this case.
However, there are good reasons to actually evaluate those lambda
expressions. Eventually invoking the lambda expressions in the quoted
form above are equivalent to using eval
. So, instead, prefer the
backquote form:
`((add . ,(lambda (a b) (+ a b)))
(sub . ,(lambda (a b) (- a b)))
(mul . ,(lambda (a b) (* a b)))
(div . ,(lambda (a b) (/ a b))))
There are a lot of interesting things to say about this, but let’s
first reduce it to two very simple cases:
(lambda (x) x)
'(lambda (x) x)
What’s the difference between these two forms? The first is a lambda
expression, and it evaluates to a function object. The other is a quoted
list that looks like a lambda expression, and it evaluates to a list —
a piece of data.
A naive evaluation of these expressions in *scratch*
(C-x C-e
)
suggests they are are identical, and so it would seem that quoting a
lambda expression doesn’t really matter:
(lambda (x) x)
;; => (lambda (x) x)
'(lambda (x) x)
;; => (lambda (x) x)
However, there are two common situations where this is not the case:
byte compilation and lexical scope.
Lambda under byte compilation
It’s a little trickier to evaluate these forms byte compiled in the
scratch buffer since that doesn’t happen automatically. But if it did,
it would look like this:
;;; -*- lexical-binding: nil; -*-
(lambda (x) x)
;; => #[(x) "\010\207" [x] 1]
'(lambda (x) x)
;; => (lambda (x) x)
The #[...]
is the syntax for a byte-code function object. As
discussed in detail in my byte-code internals article, it’s a
special vector object that contains byte-code, and other metadata, for
evaluation by Emacs’ virtual stack machine. Elisp is one of very few
languages with readable function objects, and this feature is
core to its ahead-of-time byte compilation.
The quote, by definition, prevents evaluation, and so inhibits byte
compilation of the lambda expression. It’s vital that the byte compiler
does not try to guess the programmer’s intent and compile the expression
anyway, since that would interfere with lists that just so happen to
look like lambda expressions — i.e. any list containing the lambda
symbol.
There are three reasons you want your lambda expressions to get byte
compiled:
-
Byte-compiled functions are significantly faster. That’s the main
purpose for byte compilation after all.
-
The compiler performs static checks, producing warnings and errors
ahead of time. This lets you spot certain classes of problems before
they occur. The static analysis is even better under lexical scope due
to its tighter semantics.
-
Under lexical scope, byte-compiled closures may use less memory. More
specifically, they won’t accidentally keep objects alive longer than
necessary. I’ve never seen a name for this implementation issue, but I
call it overcapturing. More on this later.
While it’s common for personal configurations to skip byte compilation,
Elisp should still generally be written as if it were going to be byte
compiled. General rule of thumb: Ensure your lambda expressions are
actually evaluated.
Lambda in lexical scope
As I’ve stressed many times, you should always use lexical
scope. There’s no practical disadvantage or trade-off involved.
Just do it.
Once lexical scope is enabled, the two expressions diverge even without
byte compilation:
;;; -*- lexical-binding: t; -*-
(lambda (x) x)
;; => (closure (t) (x) x)
'(lambda (x) x)
;; => (lambda (x) x)
Under lexical scope, lambda expressions evaluate to closures.
Closures capture their lexical environment in their closure object —
nothing in this particular case. It’s a type of function object,
making it a valid first argument to funcall
.
Since the quote prevents the second expression from being evaluated,
semantically it evaluates to a list that just so happens to look like
a (non-closure) function object. Invoking a data object as a
function is like using eval
— i.e. executing data as code.
Everyone already knows eval
should not be used lightly.
It’s a little more interesting to look at a closure that actually
captures a variable, so here’s a definition for constantly
, a
higher-order function that returns a closure that accepts any number of
arguments and returns a particular constant:
(defun constantly (x)
(lambda (&rest _) x))
Without byte compiling it, here’s an example of its return value:
(constantly :foo)
;; => (closure ((x . :foo) t) (&rest _) x)
The environment has been captured as an association list (with a
trailing t
), and we can plainly see that the variable x
is bound to
the symbol :foo
in this closure. Consider that we could manipulate
this data structure (e.g. setcdr
or setf
) to change the binding of
x
for this closure. This is essentially how closures mutate their own
environment. Moreover, closures from the same environment share
structure, so such mutations are also shared. More on this later.
Semantically, closures are distinct objects (via eq
), even if the
variables they close over are bound to the same value. This is because
they each have a distinct environment attached to them, even if in
some invisible way.
(eq (constantly :foo) (constantly :foo))
;; => nil
Without byte compilation, this is true even when there’s no lexical
environment to capture:
(defun dummy ()
(lambda () t))
(eq (dummy) (dummy))
;; => nil
The byte compiler is smart, though. As an optimization, the
same closure object is reused when possible, avoiding unnecessary
work, including multiple object allocations. Though this is a bit of
an abstraction leak. A function can (ab)use this to introspect whether
it’s been byte compiled:
(defun have-i-been-compiled-p ()
(let ((funcs (vector nil nil)))
(dotimes (i 2)
(setf (aref funcs i) (lambda ())))
(eq (aref funcs 0) (aref funcs 1))))
(have-i-been-compiled-p)
;; => nil
(byte-compile 'have-i-been-compiled-p)
(have-i-been-compiled-p)
;; => t
The trick here is to evaluate the exact same non-capturing lambda
expression twice, which requires a loop (or at least some sort of
branch). Semantically we should think of these closures as being
distinct objects, but, if we squint our eyes a bit, we can see the
effects of the behind-the-scenes optimization.
Don’t actually do this in practice, of course. That’s what
byte-code-function-p
is for, which won’t rely on a subtle
implementation detail.
Overcapturing
I mentioned before that one of the potential gotchas of not byte
compiling your lambda expressions is overcapturing closure variables in
the interpreter.
To evaluate lisp code, Emacs has both an interpreter and a virtual
machine. The interpreter evaluates code in list form: cons cells,
numbers, symbols, etc. The byte compiler is like the interpreter, but
instead of directly executing those forms, it emits byte-code that, when
evaluated by the virtual machine, produces identical visible results to
the interpreter — in theory.
What this means is that Emacs contains two different implementations
of Emacs Lisp, one in the interpreter and one in the byte compiler.
The Emacs developers have been maintaining and expanding these
implementations side-by-side for decades. A pitfall to this approach
is that the implementations can, and do, diverge in their behavior.
We saw this above with that introspective function, and it comes up
in practice with advice.
Another way they diverge is in closure variable capture. For example:
;;; -*- lexical-binding: t; -*-
(defun overcapture (x y)
(when y
(lambda () x)))
(overcapture :x :some-big-value)
;; => (closure ((y . :some-big-value) (x . :x) t) nil x)
Notice that the closure captured y
even though it’s unnecessary.
This is because the interpreter doesn’t, and shouldn’t, take the time
to analyze the body of the lambda to determine which variables should
be captured. That would need to happen at run-time each time the
lambda is evaluated, which would make the interpreter much slower.
Overcapturing can get pretty messy if macros are introducing their own
hidden variables.
On the other hand, the byte compiler can do this analysis just once at
compile-time. And it’s already doing the analysis as part of its job.
It can avoid this problem easily:
(overcapture :x :some-big-value)
;; => #[0 "\300\207" [:x] 1]
It’s clear that :some-big-value
isn’t present in the closure.
But… how does this work?
How byte compiled closures are constructed
Recall from the internals article that the four core elements of a
byte-code function object are:
- Parameter specification
- Byte-code string (opcodes)
- Constants vector
- Maximum stack usage
While a closure seems like compiling a whole new function each time
the lambda expression is evaluated, there’s actually not that much to
it! Namely, the behavior of the function remains the same. Only
the closed-over environment changes.
What this means is that closures produced by a common lambda
expression can all share the same byte-code string (second element).
Their bodies are identical, so they compile to the same byte-code.
Where they differ are in their constants vector (third element), which
gets filled out according to the closed over environment. It’s clear
just from examining the outputs:
(constantly :a)
;; => #[128 "\300\207" [:a] 2]
(constantly :b)
;; => #[128 "\300\207" [:b] 2]
constantly
has three of the four components of the closure in its own
constant pool. Its job is to construct the constants vector, and then
assemble the whole thing into a byte-code function object (#[...]
).
Here it is with M-x disassemble
:
0 constant make-byte-code
1 constant 128
2 constant "\300\207"
4 constant vector
5 stack-ref 4
6 call 1
7 constant 2
8 call 4
9 return
(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve
simplified it for this discussion.)
It pushes most of its constants on the stack. Then the stack-ref 5
(5)
puts x
on the stack. Then it calls vector
to create the constants
vector (6). Finally, it constructs the function object (#[...]
) by
calling make-byte-code
(8).
Since this might be clearer, here’s the same thing expressed back in
terms of Elisp:
(defun constantly (x)
(make-byte-code 128 "\300\207" (vector x) 2))
To see the disassembly of the closure’s byte-code:
(disassemble (constantly :x))
The result isn’t very surprising:
Things get a little more interesting when mutation is involved. Consider
this adder closure generator, which mutates its environment every time
it’s called:
(defun adder ()
(let ((total 0))
(lambda () (cl-incf total))))
(let ((count (adder)))
(funcall count)
(funcall count)
(funcall count))
;; => 3
(adder)
;; => #[0 "\300\211\242T\240\207" [(0)] 2]
The adder essentially works like this:
(defun adder ()
(make-byte-code 0 "\300\211\242T\240\207" (vector (list 0)) 2))
In theory, this closure could operate by mutating its constants vector
directly. But that wouldn’t be much of a constants vector, now would
it!? Instead, mutated variables are boxed inside a cons cell. Closures
don’t share constant vectors, so the main reason for boxing is to share
variables between closures from the same environment. That is, they have
the same cons in each of their constant vectors.
There’s no equivalent Elisp for the closure in adder
, so here’s the
disassembly:
0 constant (0)
1 dup
2 car-safe
3 add1
4 setcar
5 return
It puts two references to boxed integer on the stack (constant
,
dup
), unboxes the top one (car-safe
), increments that unboxed
integer, stores it back in the box (setcar
) via the bottom reference,
leaving the incremented value behind to be returned.
This all gets a little more interesting when closures interact:
(defun fancy-adder ()
(let ((total 0))
`(:add ,(lambda () (cl-incf total))
:set ,(lambda (v) (setf total v))
:get ,(lambda () total))))
(let ((counter (fancy-adder)))
(funcall (plist-get counter :set) 100)
(funcall (plist-get counter :add))
(funcall (plist-get counter :add))
(funcall (plist-get counter :get)))
;; => 102
(fancy-adder)
;; => (:add #[0 "\300\211\242T\240\207" [(0)] 2]
;; :set #[257 "\300\001\240\207" [(0)] 3]
;; :get #[0 "\300\242\207" [(0)] 1])
This is starting to resemble object oriented programming, with methods
acting upon fields stored in a common, closed-over environment.
All three closures share a common variable, total
. Since I didn’t
use print-circle
, this isn’t obvious from the last result, but each
of those (0)
conses are the same object. When one closure mutates
the box, they all see the change. Here’s essentially how fancy-adder
is transformed by the byte compiler:
(defun fancy-adder ()
(let ((box (list 0)))
(list :add (make-byte-code 0 "\300\211\242T\240\207" (vector box) 2)
:set (make-byte-code 257 "\300\001\240\207" (vector box) 3)
:get (make-byte-code 0 "\300\242\207" (vector box) 1))))
The backquote in the original fancy-adder
brings this article full
circle. This final example wouldn’t work correctly if those lambdas
weren’t evaluated properly.