Blast from the Past: Borland C++ on Windows 98

My first exposure to C and C++ was a little over 20 years ago. I remember it being some version of Borland C++, either 4.x or 5.x, running on Windows 95. I didn’t have a mentor, so I did the best I could slowly working through what was probably a poorly written beginner C++ book, typing out the examples and exercises with little understanding. Since I didn’t learn much from the experience, there was a 7 or 8 year gap before I’d revisit C and C++ in college.

I thought it would be interesting to revisit this software, to reevaluate it from a far more experienced perspective. Keep in mind that C++ wasn’t even standardized yet, and the most recent C standard was from 1989. Given this, what was it like to be a professional software developer using a Borland toolchain on Windows 20 years ago? Was it miserable, made bearable only by ignorance of how much better the tooling could be? Or maybe it actually wasn’t so bad, and these tools are better than I expect?

Ultimately my conclusion is that it’s a little bit of both. There are some significant capability gaps compared to today, but the core toolchain itself is actually quite reasonable, especially for the mid 1990s.

The setup

Before getting into the evaluation, let’s discuss how I got it all up and running. While it’s technically possible to run Windows 95 on a modern x86-64 machine thanks to the architecture’s extreme backwards compatibility, it’s more compatible, simpler, and safer to virtualize it. Most importantly, I can emulate older hardware that will have better driver support.

Despite that early start in Windows all those years ago, I’m primarily a Linux user. The premier virtualization solution on Linux these days is KVM, a kernel module that turns Linux into a hypervisor and makes efficient use of hardware virtualization extensions. Unfortunately pre-XP Windows doesn’t work well on KVM, so instead I’m using QEmu (with KVM disabled), a hardware emulator closely associated with KVM. Since it doesn’t take advantage of hardware virtualization extensions, it will be slower. This is fine since my goal is to emulate slow, 20+ year old hardware anyway.

There’s very little practical difference between Windows 95 and Windows 98. Since Windows 98 runs a lot smoother virtualized, I decided to go with that instead. This will be perfectly sufficient for my toolchain evaluation.

Software

To get started, I’ll need an installer for Windows 98. I thought this would be difficult to find, but there’s a copy available on the Internet Archive. I don’t know how “legitimate” this is, but it works. Since it’s running in a virtual machine without network access, I also don’t really care if this copy is somehow infected with malware.

Internet Archive: Windows 98 Second Edition

Also on the Internet Archive is a complete copy of Borland C++ 5.02, with the same caveats of legitimacy. It works, which is good enough for my purposes.

Internet Archive: Borland C++ 5.02

Thank you Internet Archive!

Hardware

I’ve got my software, now to set up the virtualized hardware. First I create a drive image:

$ qemu-image create -fqcow2 win98.img 8G

I gave it 8GB, which is actually a bit overkill. Giving Windows 98 a virtual hard drive with modern sizes would probably break the installer. This sort of issue is a common theme among old software, where there may be complaints about negative available disk space due to signed integer overflow.

I decided to give the machine 256MB of memory (-m 256). This is also a little excessive, but I wanted to be sure memory didn’t limit Borland’s capabilities. This amount of memory is close to the upper bound, and going much beyond will likely cause problems with Windows 98.

For the CPU I settled on a Pentium (-cpu pentium). My original goal was to go a little simpler with a 486 (-cpu 486), but the Windows 98 installer kept crashing when I tried this.

I experimented with different configurations for the network card, but I couldn’t get anything to work. So I’ve disabled networking (-net none). The only reason I’d want this is that it would be easier to move files in and out of the virtual machine.

Finally, here’s how I ran QEmu. The last two lines are only needed when installing.

$ qemu-system-x86_64 \
    -localtime \
    -cpu pentium \
    -no-acpi \
    -no-hpet \
    -m 256 \
    -hda win98.img \
    -soundhw sb16 \
    -vga cirrus \
    -net none \
    -cdrom "Windows 98 Second Edition.iso" \
    -boot d

Installation

Installation is just a matter of following the instructions. You’ll need that product key listed on the Internet Archive site.

That copy of Borland is just a big .zip file. This presents two problems.

  1. Without network access, I’ll need to figure out how to get this inside the virtual machine.

  2. This version of Windows doesn’t come with software to unzip this file. I’d need to find and install an unzip tool first.

Fortunately I can kill two birds with one stone by converting that .zip archive into a .iso and mounting it in the virtual machine.

unzip "BORLAND C++.zip"
genisoimage -R -J -o borland.iso "BORLAND C++"

Then in the QEmu console (C-A-2) I attach it:

change ide1-cd0 borland.iso

This little trick of generating .iso files and mounting them is how I will be moving all the other files into the virtual machine.

Borland C++

The first thing I did was play around with with Borland IDE. This is what I would have been using 20 years ago.

Despite being Borland C++, I’m personally most interested in its ANSI C compiler. As I already pointed out, this software pre-dates C++’s standardization, and a lot has changed over the past two decades. On the other hand, C hasn’t really changed all that much. The 1999 update to the C standard (e.g. “C99”) was big and important, but otherwise little has changed. The biggest drawback is the lack of “declare anywhere” variables, including in for-loop initializers. Otherwise it’s the same as writing C today.

To test drive the IDE, I made a couple of test projects, built and ran them with different options, and poked around with the debugger. The debugger is actually pretty decent, especially for the 1990s. It can be operated via the IDE or standalone, so I could use it without firing up the IDE and making a project.

The toolchain includes an assembler, and I can inspect the compiler’s assembly output. To nobody’s surprise this is Intel-flavored assembly, which is very welcome. Imagining myself as a software developer in the mid 1990s, this means I can see exactly what the compiler’s doing as well as write some of the performance sensitive parts in assembly if necessary.

The built-in editor is the worst part of the IDE, which is unfortunate since it really spoils the whole experience. It’s easy to jump between warnings and errors, it has incremental search, and it has good syntax highlighting. But these are the only positive things I can say about it. If I had to work with this editor full-time, I’d spend my days pretty irritated.

Switch to command line tools

Like with the debugger, the Borland people did a good job modularizing their development tools. As part of the installation process, all of the Borland command line tools are added to the system PATH (reminder: this is a single-user system). This includes compiler, linker, assembler, debugger, and even an incomplete implementation of make.

With this, I can essentially pretend the IDE doesn’t exist and replace that crummy editor with something better: Vim.

The last version of Vim to support MS-DOS and Windows 95/98 is Vim 7.3, released in 2010. I download those binaries, trim a few things from my .vimrc, and smuggle it all into my virtual machine via a virtual CD. I’ve now got a powerful text editor in Windows 98 and my situation has drastically improved.

Since I hardly use features added since Vim 7.3, this feels right at home to me. I can invoke the build from Vim, and it can populate the quickfix list from Borland’s output, so I could actually be fairly productive in these circumstances! I’m honestly really impressed with how well this all works together.

At this point I only have two significant annoyances:

  1. Borland’s command line tools belong to that category of irritating programs that print their version banner on every invocation. There’s not even a command line switch to turn this off. All this noise is quickly tiresome. The Visual Studio toolchain does the same thing by default, though it can be turned off (-nologo). I dislike that some GNU tools also commit this sin, but at least GNU limits this to interactive programs.

  2. The Windows/DOS command shell and console is even worse than it is today. I didn’t think that was possible. This is back when it was still genuinely DOS and not just pretending to be (e.g. in NT). The worst part by far is the lack of command history. There’s no using the up-arrow to get previous commands. There’s no tab completion. Forward slash is not a substitute for backslash in paths. If I wanted to improve my productivity, replacing this console and shell would be the first priority.

Update: In an email, Aristotle Pagaltzis informed me that Windows 98 comes with DOSKEY.COM, which provides command history for COMMAND.EXE. Alternatively there’s Enhanced DOSKEY.com, an open source, alternative implementation that also provides tab completion for commands and filesnames. This makes the console a lot more usable (and, honestly, in some ways better than the modern defaults).

Building Enchive with Borland

Last year I wrote a backup encryption tool called Enchive, and I still use it regularly. One of my design goals was high portability since it may be needed to decrypt something important in the distant future. It should be as bit-rot-proof as possible. In software, the best way to future-proof is to past-proof.

If I had a time machine that could send source code back in time, and I sent Enchive to a competant developer 20 years ago, would they be able to compile it and run it? If the answer is yes, then that means Enchive already has 20 years of future-proofing built into it.

To accomplish this, Enchive is 3,300 lines of strict ANSI C, 1989-style, with no dependencies other than the C standard library and a handful of operating system functions — e.g. functionality not in the C standard library. In practice, any ANSI C compiler targeting either POSIX, or Windows 95 or later, should be able to compile it.

My Windows 98 virtual machine includes an ANSI C compiler, and can be used to simulate this time machine. I generated an “amalgamation” build (make amalgamation) — essentially a concatenation of all the source files — and sent this into the virtual machine. Before Borland was able to compile it, I needed to make three small changes.

First, Enchive includes stdint.h to get fixed-width integers needed for the encryption routines. This header comes from C99, and C89 has no equivalent. I anticipated this problem from the beginning and made it easy for the person performing the build to correct it. This header is included exactly once, in config.h, and this is placed at the top of the amalgamation build. The include only needs to be replaced with a handful of manual typedefs. For Borland that looks like this:

typedef unsigned char    uint8_t;
typedef unsigned short   uint16_t;
typedef unsigned long    uint32_t;
typedef unsigned __int64 uint64_t;

typedef long             int32_t;
typedef __int64          int64_t;

#define INT8_C(n)   (n)
#define INT16_C(n)  (n)
#define INT32_C(n)  (n##U)

Second, in more recent versions of Windows, GetFileAttributes() can return the value INVALID_FILE_ATTRIBUTES. Checking for an error that cannot happen is harmless, but this value isn’t defined in Borland’s SDK. I only had to eliminate that check.

Third, the CryptGenRandom() interface isn’t defined in Borland’s SDK. This is used by Enchive to generate keys. MSDN reports this function wasn’t available until Windows XP, but it’s definitely there in Windows 98, exported by ADVAPI32.dll. I’m able to call it, though it always reports an error. Perhaps it’s been disabled in this version due to cryptographic export restrictions?

Regardless of what’s wrong, I ripped this out and replaced it with a fatal error. This version of Enchive can’t generate new keys — unless derived from a passphrase — nor encrypt files, including the use of a protection key to encrypt the secret key. However, it can decrypt files, which is the important part that needs to be future-proofed.

With this three changes — which took me about 10 minutes to sort out — Enchive builds and runs, and it correctly decrypts files I encrypted on Linux. So Enchive has at least 20 years of past-proofing! The screenshot at the top of this article shows it running successfully in an MS-DOS console window.

What’s wrong? What’s missing?

I mentioned that there were some gaps. The most obvious is the lack of the standard POSIX utilities, especially a decent shell. I don’t know if any had been ported to Windows in the mid 1990s. But that could be solved one way or another without too much trouble, even if it meant doing some of that myself.

No, the biggest capability I’d miss, and which wouldn’t be easily obtained, is Git, or a least a decent source control system. I really don’t want to work without proper source control. Git’s support for Windows is second tier, and the port to modern Windows is already a bit of a hack. Getting it to run in Windows 98 would probably be a challenge, especially if I had to compile it with Borland.

The other major issue is the lack of stability. In this experiment, I’ve been seeing this screen a lot:

I remember Windows crashing a lot back in those days, and it certainly had a bad reputation for being unstable, but this is far worse than I remembered. While the hardware emulator may be somewhat at fault here, keep in mind that I never installed third party drivers. Most of these crashes are Windows’ fault. I found I can reliably bring the whole system down with a single GetProcAddress() call on a system DLL. The only way I can imagine this instability was so tolerated back then was general ignorance that computing could be so much better.

I was tempted to write this article in Vim on Windows 98, but all this crashing made me too nervous. I didn’t want some stupid filesystem corruption to wipe out my work. Too risky.

A better alternative

If I was stuck working in Windows 98 — or was at least targeting it as a platform — but had access to a modern tooling ecosystem, could I do better than Borland? Yes! Programs built by Mingw-w64 can be run even as far back as Windows 95.

Now, there’s a catch. I thought it would be this simple:

$ i686-w64-mingw32-gcc -Os hello.c

But when I brought the resulting binary into the virtual machine it crashed when ran it: illegal instruction. Turns out it contained a conditional move (cmov) which is an instruction not available until the Pentium Pro (686). The “pentium” emulation is just a 586.

I tried to disable cmov by picking the specific architecture:

$ i686-w64-mingw32-gcc -march=pentium -Os hello.c

This still didn’t work because the statically-linked part of the CRT contained the cmov. I’d have to recompile that as well.

I could have switched the QEmu options to “upgrade” to a Pentium Pro, but remember that my goal was really the 486. Fortunately this was easy to fix: compile my own Mingw-w64 cross-compiler. I’ve done this a number of times before, so I knew it wouldn’t be difficult.

I could go step by step, but it’s all fairly well documented in the Mingw-64 “howto-build” document. I used GCC 7.3 (the latest version), and for the target I picked “i486-w64-mingw32”. When it was done I could compile binaries on Linux to run in my Windows 98 virtual machine:

$ i486-w64-mingw32-gcc -Os hello.c

This should enable quite a bit of modern software to run inside my virtual machine if I so wanted. I didn’t actually try this (yet?), but, to take this concept all the way, I could use this cross-compiler to cross-compile Mingw-w64 itself to run inside the virtual machine, directly replacing Borland C++.

And the only thing I’d miss about Borland is its debugger.

A Crude Personal Package Manager

For the past couple of months I’ve been using a custom package manager to manage a handful of software packages within various unix-like environments. Packages are installed in my home directory under ~/.local/bin, and the package manager itself is just a 110 line Bourne shell script. It’s is not intended to replace the system’s package manager but, instead, compliment it in some cases where I need more flexibility. I use it to run custom versions of specific pieces of software — newer or older than the system-installed versions, or with my own patches and modifications — without interfering with the rest of system, and without a need for root access. It’s worked out really well so far and I expect to continue making heavy use of it in the future.

It’s so simple that I haven’t even bothered putting the script in its own repository. It sits unadorned within my dotfiles repository with the name qpkg (“quick package”):

Sitting alongside my dotfiles means it’s always there when I need it, just as if it was a built-in command.

I say it’s crude because its “install” (-I) procedure is little more than a wrapper around tar. It doesn’t invoke libtool after installing a library, and there’s no post-install script — or postinst as Debian calls it. It doesn’t check for conflicts between packages, though there’s a command for doing so manually ahead of time. It doesn’t manage dependencies, nor even have them as a concept. That’s all on the user to screw up.

In other words, it doesn’t attempt solve most of the hard problems tackled by package managers… except for three important issues:

  1. It provides a clean, guaranteed-to-work uninstall procedure. Some Makefiles do have a token “uninstall” target, but it’s often unreliable.

  2. Unlike blindly using a Makefile “install” target, I can check for conflicts before installing the software. I’ll know if and how a package clobbers an already-installed package, and I can manage, or ignore, that conflict manually as needed.

  3. It produces a compact, reusable package file that I can reinstall later, even on a different machine (with a couple of caveats). I don’t need to keep around the original source and build directories should I want to install or uninstall later. I can also rapidly switch back and forth between different builds of the same software.

The first caveat is that the package will be configured for exactly my own home directory, so I usually can’t share it with other users, or install it on machines where I have a different home directory. Though I could still create packages for different installation prefixes.

The second caveat is that some builds tailor themselves by default to the host (e.g. -march=native). If care isn’t taken, those packages may not be very portable. This is more common than I had expected and has mildly annoyed me.

Birth of a package manager

While the package manager is new, I’ve been building and installing software in my home directory for years. I’d follow the normal process of setting the install prefix to $HOME/.local, running the build, and then letting the “install” target do its thing.

$ tar xzf name-version.tar.gz
$ cd name-version/
$ ./configure --prefix=$HOME/.local
$ make -j$(nproc)
$ make install

This worked well enough for years. However, I’ve come to rely a lot on this technique, and I’m using it for increasingly sophisticated purposes, such as building custom cross-compiler toolchains.

A common difficulty has been handling the release of new versions of software. I’d like to upgrade to the new version, but lack a way to cleanly uninstall the previous version. Simply clobbering the old version by installing it on top usually works. Occasionally it wouldn’t, and I’d have to blow away ~/.local and start all over again. With more and more software installed in my home directory, restarting has become more and more of a chore that I’d like to avoid.

What I needed was a way to track exactly which files were installed so that I could remove them later when I needed to uninstall. Fortunately there’s a widely-used convention for exactly this purpose: DESTDIR.

It’s expected that when a Makefile provides an “install” target, it prefixes the installation path with the DESTDIR macro, which is assigned to the empty string by default. This allows the user to install the software to a temporary location for the purposes of packaging. Unlike the installation prefix (--prefix) configured before the build began, the software is not expected to function properly when run in the DESTDIR location.

$ DESTDIR=_destdir
$ mkdir $DESTDIR
$ make DESTDIR=destdir install

A different tool will used to copy these files into place and actually install it. This tool can track what files were installed, allowing them to be removed later when uninstalling. My package manager uses the tar program for both purposes. First it creates a package by packing up the DESTDIR (at the root of the actual install prefix):

$ tar czf package.tgz -C $DESTDIR$HOME/.local .

So a package is nothing more than a gzipped tarball. To install, it unpacks the tarball in ~/.local.

$ cd $HOME/.local
$ tar xzf ~/package.tgz

But how does it uninstall a package? It didn’t keep track of what was installed. Easy! The tarball itself contains the package list, and it’s printed with tar’s t mode.

cd $HOME/.local
for file in $(tar tzf package.tgz | grep -v '/$'); do
    rm -f "$file"
done

I’m using grep to skip directories, which are conveniently listed with a trailing slash. Note that in the example above, there are a couple of issues with file names containing whitespace. If the file contains a space character, it will word split incorrectly in the for loop. A Makefile couldn’t handle such a file in the first place, but, in case it’s still necessary, my package manager sets IFS to just a newline.

If the file name contains a newline, then my package manager relies on a cosmic ray striking just the right bit at just the right instant to make it all work out, because no version of tar can unambiguously print such file names. Crossing your fingers during this process may help.

Commands

There are five commands, each assigned to a capital letter: -B, -C, -I, -V, and -U. It’s an interface pattern inspired by Ted Unangst’s signify (see signify(1)). I also used this pattern with Blowpipe and, in retrospect, wish I had also used with Enchive.

Build (-B)

Unlike the other three commands, the “build” command isn’t essential, and is just for convenience. It assumes the build uses an Autoconfg-like configure script and runs it automatically, followed by make with the appropriate -j (jobs) option. It automatically sets the --prefix argument when running the configure script.

If the build uses something other and an Autoconf-like configure script, such as CMake, then you can’t use the “build” command and must perform the build yourself. For example, I must do this when building LLVM and Clang.

Before using the “build” command, the package must first be unpacked and patched if necessary. Then the package manager can take over to run the build.

$ tar xzf name-version.tar.gz
$ cd name-version/
$ patch -p1 < ../0001.patch
$ patch -p1 < ../0002.patch
$ patch -p1 < ../0003.patch
$ cd ..
$ mkdir build
$ cd build/
$ qpkg -B ../name-version/

In this example I’m doing an out-of-source build by invoking the configure script from a different directory. Did you know Autoconf scripts support this? I didn’t know until recently! Unfortunately some hand-written Autoconf-like scripts don’t, though this will be immediately obvious.

Once qpkg returns, the program will be fully built — or stuck on a build error if you’re unlucky. If you need to pass custom configure options, just tack them on the qpkg command:

$ qpkg -B ../name-version/ --without-libxml2 --with-ncurses

Since the second and third steps — creating the build directory and moving into it — is so common, there’s an optional switch for it: -d. This option’s argument is the build directory. qpkg creates that directory and runs the build inside it. In practice I just use “x” for the build directory since it’s so quick to add “dx” to the command.

$ tar xzf name-version.tar.gz
$ qpkg -Bdx ../name-version/

With the software compiled, the next step is creating the package.

Create (-C)

The “create” command creates the DESTDIR (_destdir in the working directory) and runs the “install” Makefile target to fill it with files. Continuing with the example above and its x/ build directory:

$ qpkg -Cdx name

Where “name” is the name of the package, without any file name extension. Like with “build”, extra arguments after the package name are passed to make in case there needs to be any additional tweaking.

When the “create” command finishes, there will be new package named name.tgz in the working directory. At this point the source and build directories are no longer needed, assuming everything went fine.

$ rm -rf name-version/
$ rm -rf x/

This package is ready to install, though you may want to verify it first.

Verify (-V)

The “verify” command checks for collisions against installed packages. It works like uninstallation, but rather than deleting files, it checks if any of the files already exist. If they do, it means there’s a conflict with an existing package. These file names are printed.

$ qpkg -V name.tgz

The most common conflict I’ve seen is in the info index (info/dir) file, which is safe to ignore since I don’t care about it.

If the package has already been installed, there will of course be tons of conflicts. This is the easiest way to check if a package has been installed.

Install (-I)

The “install” command is just the dumb tar xzf explained above. It will clobber anything in its way without warning, which is why, if that matters, “verify” should be used first.

$ qpkg -I name.tgz

When qpkg returns, the package has been installed and is probably ready to go. A lot of packages complain that you need to run libtool to finalize an installation, but I’ve never had a problem skipping it. This dumb unpacking generally works fine.

Uninstall (-U)

Obviously the last command is “uninstall”. As explained above, this needs the original package that was given to the “install” command.

$ qpkg -U name.tgz

Just as “install” is dumb, so is “uninstall,” blindly deleting anything listed in the tarball. One thing I like about dumb tools is that there are no surprises.

I typically suffix the package name with the version number to help keep the packages organized. When upgrading to a new version of a piece of software, I build the new package, which, thanks to the version suffix, will have a distinct name. Then I uninstall the old package, and, finally, install the new one in its place. So far I’ve been keeping the old package around in case I still need it, though I could always rebuild it in a pinch.

Package by accumulation

Building a GCC cross-compiler toolchain is a tricky case that doesn’t fit so well with the build, create, and install process illustrated above. It would be nice for the cross-compiler to be a single, big package, but due to the way it’s built, it would need to be five or so packages, a couple of which will conflict (one being a subset of another):

  1. binutils
  2. C headers
  3. core GCC
  4. C runtime
  5. rest of GCC

Each step needs to be installed before the next step will work. (I don’t even want to think about cross-compiling a cross-compiler.)

To deal with this, I added a “keep” (-k) option that leaves the DESTDIR around after creating the package. To keep things tidy, the intermediate packages exist and are installed, but the final, big cross-compiler package accumulates into the DESTDIR. The final package at the end is actually the whole cross compiler in one package, a superset of them all.

Complicated situations like these are where I can really understand the value of Debian’s fakeroot tool.

My use case, and an alternative

The role filled by my package manager is actually pretty well suited for pkgsrc, which is NetBSD’s ports system made available to other unix-like systems. However, I just need something really lightweight that gives me absolute control — even more than I get with pkgsrc — in the dozen or so cases where I really need it.

All I need is a standard C toolchain in a unix-like environment (even a really old one), the source tarballs for the software I need, my 110 line shell script package manager, and one to two cans of elbow grease. From there I can bootstrap everything I might need without root access, even in a disaster. If the software I need isn’t written in C, it can ultimately get bootstrapped from some crusty old C compiler, which might even involve building some newer C compilers in between. After a certain point it’s C all the way down.

Emacs Lisp Lambda Expressions Are Not Self-Evaluating

This week I made a mistake that ultimately enlightened me about the nature of function objects in Emacs Lisp. There are three kinds of function objects, but they each behave very differently when evaluated as objects.

But before we get to that, let’s talk about one of Emacs’ embarrassing, old missteps: eval-after-load.

Taming an old dragon

One of the long-standing issues with Emacs is that loading Emacs Lisp files (.el and .elc) is a slow process, even when those files have been byte compiled. There are a number of dirty hacks in place to deal with this issue, and the biggest and nastiest of them all is the dumper, also known as unexec.

The Emacs you routinely use throughout the day is actually a previous instance of Emacs that’s been resurrected from the dead. Your undead Emacs was probably created months, if not years, earlier, back when it was originally compiled. The first stage of compiling Emacs is to compile a minimal C core called temacs. The second stage is loading a bunch of Emacs Lisp files, then dumping a memory image in an unportable, platform-dependent way. On Linux, this actually requires special hooks in glibc. The Emacs you know and love is this dumped image loaded back into memory, continuing from where it left off just after it was compiled. Regardless of your own feelings on the matter, you have to admit this is a very lispy thing to do.

There are two notable costs to Emacs’ dumper:

  1. The dumped image contains hard-coded memory addresses. This means Emacs can’t be a Position Independent Executable (PIE). It can’t take advantage of a security feature called Address Space Layout Randomization (ASLR), which would increase the difficulty of exploiting some classes of bugs. This might be important to you if Emacs processes untrusted data, such as when it’s used as a mail client, a web server or generally parses data downloaded across the network.

  2. It’s not possible to cross-compile Emacs since it can only be dumped by running temacs on its target platform. As an experiment I’ve attempted to dump the Windows version of Emacs on Linux using Wine, but was unsuccessful.

The good news is that there’s a portable dumper in the works that makes this a lot less nasty. If you’re adventurous, you can already disable dumping and run temacs directly by setting CANNOT_DUMP=yes at compile time. Be warned, though, that a non-dumped Emacs takes several seconds, or worse, to initialize before it even begins loading your own configuration. It’s also somewhat buggy since it seems nobody ever runs it this way productively.

The other major way Emacs users have worked around slow loading is aggressive use of lazy loading, generally via autoloads. The major package interactive entry points are defined ahead of time as stub functions. These stubs, when invoked, load the full package, which overrides the stub definition, then finally the stub re-invokes the new definition with the same arguments.

To further assist with lazy loading, an evaluated defvar form will not override an existing global variable binding. This means you can, to a certain extent, configure a package before it’s loaded. The package will not clobber any existing configuration when it loads. This also explains the bizarre interfaces for the various hook functions, like add-hook and run-hooks. These accept symbols — the names of the variables — rather than values of those variables as would normally be the case. The add-to-list function does the same thing. It’s all intended to cooperate with lazy loading, where the variable may not have been defined yet.

eval-after-load

Sometimes this isn’t enough and you need some some configuration to take place after the package has been loaded, but without forcing it to load early. That is, you need to tell Emacs “evaluate this code after this particular package loads.” That’s where eval-after-load comes into play, except for its fatal flaw: it takes the word “eval” completely literally.

The first argument to eval-after-load is the name of a package. Fair enough. The second argument is a form that will be passed to eval after that package is loaded. Now hold on a minute. The general rule of thumb is that if you’re calling eval, you’re probably doing something seriously wrong, and this function is no exception. This is completely the wrong mechanism for the task.

The second argument should have been a function — either a (sharp quoted) symbol or a function object. And then instead of eval it would be something more sensible, like funcall. Perhaps this improved version would be named call-after-load or run-after-load.

The big problem with passing an s-expression is that it will be left uncompiled due to being quoted. I’ve talked before about the importance of evaluating your lambdas. eval-after-load not only encourages badly written Emacs Lisp, it demands it.

;;; BAD!
(eval-after-load 'simple-httpd
                 '(push '("c" . "text/plain") httpd-mime-types))

This was all corrected in Emacs 25. If the second argument to eval-after-load is a function — the result of applying functionp is non-nil — then it uses funcall. There’s also a new macro, with-eval-after-load, to package it all up nicely.

;;; Better (Emacs >= 25 only)
(eval-after-load 'simple-httpd
  (lambda ()
    (push '("c" . "text/plain") httpd-mime-types)))

;;; Best (Emacs >= 25 only)
(with-eval-after-load 'simple-httpd
  (push '("c" . "text/plain") httpd-mime-types))

Though in both of these examples the compiler will likely warn about httpd-mime-types not being defined. That’s a problem for another day.

A workaround

But what if you need to use Emacs 24, as was the situation that sparked this article? What can we do with the bad version of eval-after-load? We could situate a lambda such that it’s evaluated, but then smuggle the resulting function object into the form passed to eval-after-load, all using a backquote.

;;; Note: this is subtly broken
(eval-after-load 'simple-httpd
  `(funcall
    ,(lambda ()
       (push '("c" . "text/plain") httpd-mime-types)))

When everything is compiled, the backquoted form evalutes to this:

(funcall #[0 <bytecode> [httpd-mime-types ("c" . "text/plain")] 2])

Where the second value (#[...]) is a byte-code object. However, as the comment notes, this is subtly broken. A cleaner and correct way to solve all this is with a named function. The damage caused by eval-after-load will have been (mostly) minimized.

(defun my-simple-httpd-hook ()
  (push '("c" . "text/plain") httpd-mime-types))

(eval-after-load 'simple-httpd
  '(funcall #'my-simple-httpd-hook))

But, let’s go back to the anonymous function solution. What was broken about it? It all has to do with evaluating function objects.

Evaluating function objects

So what happens when we evaluate an expression like the one above with eval? Here’s what it looks like again.

(funcall #[...])

First, eval notices it’s been given a non-empty list, so it’s probably a function call. The first argument is the name of the function to be called (funcall) and the remaining elements are its arguments. But each of these elements must be evaluated first, and the result of that evaluation becomes the arguments.

Any value that isn’t a list or a symbol is self-evaluating. That is, it evaluates to its own value:

(eval 10)
;; => 10

If the value is a symbol, it’s treated as a variable. If the value is a list, it goes through the function call process I’m describing (or one of a number of other special cases, such as macro expansion, lambda expressions, and special forms).

So, conceptually eval recurses on the function object #[...]. A function object is not a list or a symbol, so it’s self-evaluating. No problem.

;; Byte-code objects are self-evaluating

(let ((x (byte-compile (lambda ()))))
  (eq x (eval x)))
;; => t

What if this code wasn’t compiled? Rather than a byte-code object, we’d have some other kind of function object for the interpreter. Let’s examine the dynamic scope (shudder) case. Here, a lambda appears to evaluate to itself, but appearances can be deceiving:

(eval (lambda ())
;; => (lambda ())

However, this is not self-evaluation. Lambda expressions are not self-evaluating. It’s merely coincidence that the result of evaluating a lambda expression looks like the original expression. This is just how the Emacs Lisp interpreter is currently implemented and, strictly speaking, it’s an implementation detail that just so happens to be mostly compatible with byte-code objects being self-evaluating. It would be a mistake to rely on this.

Instead, dynamic scope lambda expression evaluation is idempotent. Applying eval to the result will return an equal, but not identical (eq), expression. In contrast, a self-evaluating value is also idempotent under evaluation, but with eq results.

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x)))
;; => nil

;; Evaluation is idempotent:

(let ((x '(lambda ())))
  (equal x (eval x)))
;; => t

(let ((x '(lambda ())))
  (equal x (eval (eval x))))
;; => t

So, with dynamic scope, the subtly broken backquote example will still work, but only by sheer luck. Under lexical scope, the situation isn’t so lucky:

;;; -*- lexical-scope: t; -*-

(lambda ())
;; => (closure (t) nil)

These interpreted lambda functions are neither self-evaluating nor idempotent. Passing t as the second argument to eval tells it to use lexical scope, as shown below:

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x t)))
;; => nil

;; Not idempotent:

(let ((x '(lambda ())))
  (equal x (eval x t)))
;; => nil

(let ((x '(lambda ())))
  (equal x (eval (eval x t) t)))
;; error: (void-function closure)

I can imagine an implementation of Emacs Lisp where dynamic scope lambda expressions are in the same boat, where they’re not even idempotent. For example:

;;; -*- lexical-binding: nil; -*-

(lambda ())
;; => (totally-not-a-closure ())

Most Emacs Lisp would work just fine under this change, and only code that makes some kind of logical mistake — where there’s nested evaluation of lambda expressions — would break. This essentially already happened when lots of code was quietly switched over to lexical scope after Emacs 24. Lambda idempotency was lost and well-written code didn’t notice.

There’s a temptation here for Emacs to define a closure function or special form that would allow interpreter closure objects to be either self-evaluating or idempotent. This would be a mistake. It would only serve as a hack that covers up logical mistakes that lead to nested evaluation. Much better to catch those problems early.

Solving the problem with one character

So how do we fix the subtly broken example? With a strategically placed quote right before the comma.

(eval-after-load 'simple-httpd
  `(funcall
    ',(lambda ()
        (push '("c" . "text/plain") httpd-mime-types)))

So the form passed to eval-after-load becomes:

;; Compiled:
(funcall (quote #[...]))

;; Dynamic scope:
(funcall (quote (lambda () ...)))

;; Lexical scope:
(funcall (quote (closure (t) () ...)))

The quote prevents eval from evaluating the function object, which would be either needless or harmful. There’s also an argument to be made that this is a perfect situation for a sharp-quote (#'), which exists to quote functions.

Two Chaotic Motion Demos

I’ve put together two online, interactive, demonstrations of chaotic motion. One is 2D and the other is 3D, but both are rendered using WebGL — which, for me, is the most interesting part. Both are governed by ordinary differential equations. Both are integrated using the Runge–Kutta method, specifically RK4.

Far more knowledgeable people have already written introductions for chaos theory, so here’s just a quick summary. A chaotic system is deterministic but highly sensitive to initial conditions. Tweaking a single bit of the starting state of either of my demos will quickly lead to two arbitrarily different results. Both demonstrations have features that aim to show this in action.

This ain’t my first chaotic system rodeo. About eight years ago I made water wheel Java applet, and that was based on some Matlab code I collaborated on some eleven years ago. I really hope you’re not equipped to run a crusty old Java applet in 2018, though. (Update: now upgraded to HTML5 Canvas.)

If you want to find either of these demos again in the future, you don’t need to find this article first. They’re both listed in my Showcase page, linked from the header of this site.

Double pendulum

First up is the classic double pendulum. This one’s more intuitive than my other demo since it’s modeling a physical system you could actually build and observe in the real world.

Source: https://github.com/skeeto/double-pendulum

I lifted the differential equations straight from the Wikipedia article (derivative() in my code). Same for the Runge–Kutta method (rk4() in my code). It’s all pretty straightforward. RK4 may not have been the best choice for this system since it seems to bleed off energy over time. If you let my demo run over night, by morning there will obviously be a lot less activity.

I’m not a fan of buttons and other fancy GUI widgets — neither designing them nor using them myself — prefering more cryptic, but easy-to-use keyboard-driven interfaces. (Hey, it works well for mpv and MPlayer.) I haven’t bothered with a mobile interface, so sorry if you’re reading on your phone. You’ll just have to enjoy watching a single pendulum.

Here are the controls:

To witness chaos theory in action:

  1. Start with a single pendulum (the default).
  2. Pause the simulation (SPACE).
  3. Make a dozen or so clones (press c for awhile).
  4. Unpause.

At first it will appear as single pendulum, but they’re actually all stacked up, each starting from slightly randomized initial conditions. Within a minute you’ll witness the pendulums diverge, and after a minute they’ll all be completely different. It’s pretty to watch them come apart at first.

It might appear that the m key doesn’t actually do anything. That’s because the HTML5 Canvas rendering — which is what I actually implemented first — is really close to the WebGL rendering. I’m really proud of this. There are just three noticeable differences. First, there’s a rounded line cap in the Canvas rendering where the pendulum is “attached.” Second, the tail line segments aren’t properly connected in the Canvas rendering. The segments are stroked separately in order to get that gradient effect along its path. Third, it’s a lot slower, particularly when there are many pendulums to render.

In WebGL the two “masses” are rendered using that handy old circle rasterization technique on a quad. Either a triangle fan or pre-rendering the circle as a texture would probably have been a better choices. The two bars are the same quad buffers, just squeezed and rotated into place. Both were really simple to create. It’s the tail that was tricky to render.

When I wrote the original Canvas renderer, I set the super convenient lineWidth property to get a nice, thick tail. In my first cut at rendering the tail I used GL_LINE_STRIP to draw a line primitive. The problem with the line primitive is that an OpenGL implementation is only required to support single pixel wide lines. If I wanted wider, I’d have to generate the geometry myself. So I did.

Like before, I wasn’t about to dirty my hands manipulating a graphite-filled wooden stick on a piece of paper to solve this problem. No, I lifted the math from something I found on the internet again. In this case it was a forum post by paul.houx, which provides a few vector equations to compute a triangle strip from a line strip. My own modification was to add a miter limit, which keeps sharp turns under control. You can find my implementation in polyline() in my code. Here’s a close-up with the skeleton rendered on top in black:

For the first time I’m also using ECMAScript’s new template literals to store the shaders inside the JavaScript source. These string literals can contain newlines, but, even cooler, I it does string interpolation, meaning I can embed JavaScript variables directly into the shader code:

let massRadius = 0.12;
let vertexShader = `
attribute vec2 a_point;
uniform   vec2 u_center;
varying   vec2 v_point;

void main() {
    v_point = a_point;
    gl_Position = vec4(a_point * ${massRadius} + u_center, 0, 1);
}`;

Allocation avoidance

If you’ve looked at my code you might have noticed something curious. I’m using a lot of destructuring assignments, which is another relatively new addition to ECMAScript. This was part of a little experiment.

function normalize(v0, v1) {
    let d = Math.sqrt(v0 * v0 + v1 * v1);
    return [v0 / d, v1 / d];
}

/* ... */

let [nx, ny] = normalize(-ly, lx);

One of my goals for this project was zero heap allocations in the main WebGL rendering loop. There are no garbage collector hiccups if there’s no garbage to collect. This sort of thing is trivial in a language with manual memory management, such as C and C++. Just having value semantics for aggregates would be sufficient.

But with JavaScript I don’t get to choose how my objects are allocated. I either have to pre-allocate everything, including space for all the intermediate values (e.g. an object pool). This would be clunky and unconventional. Or I can structure and access my allocations in such a way that the JIT compiler can eliminate them (via escape analysis, scalar replacement, etc.).

In this case, I’m trusting that JavaScript implementations will flatten these destructuring assignments so that the intermediate array never actually exists. It’s like pretending the array has value semantics. This seems to work as I expect with V8, but not so well with SpiderMonkey (yet?), at least in Firefox 52 ESR.

Single precision

I briefly considered using Math.fround() to convince JavaScript to compute all the tail geometry in single precision. The double pendulum system would remain double precision, but the geometry doesn’t need all that precision. It’s all rounded to single precision going out to the GPU anyway.

Normally when pulling values from a Float32Array, they’re cast to double precision — JavaScript’s only numeric type — and all operations are performed in double precision, even if the result is stored back in a Float32Array. This is because the JIT compiler is required to correctly perform all the intermediate rounding. To relax this requirement, surround each operation with a call to Math.fround(). Since the result of doing each operation in double precision with this rounding step in between is equivalent to doing each operation in single precision, the JIT compiler can choose to do the latter.

let x = new Float32Array(n);
let y = new Float32Array(n);
let d = new Float32Array(n);
// ...
for (let i = 0; i < n; i++) {
    let xprod = Math.fround(x[i] * x[i]);
    let yprod = Math.fround(y[i] * y[i]);
    d[i] = Math.sqrt(Math.fround(xprod + yprod));
}

I ultimately decided not to bother with this since it would significantly obscures my code for what is probably a minuscule performance gain (in this case). It’s also really difficult to tell if I did it all correctly. So I figure this is better suited for compilers that target JavaScript rather than something to do by hand.

Lorenz system

The other demo is a Lorenz system with its famous butterfly pattern. I actually wrote this one a year and a half ago but never got around to writing about it. You can tell it’s older because I’m still using var.

Source: https://github.com/skeeto/lorenz-webgl

Like before, the equations came straight from the Wikipedia article (Lorenz.lorenz() in my code). They math is a lot simpler this time, too.

This one’s a bit more user friendly with a side menu displaying all your options. The keys are basically the same. This was completely by accident, I swear. Here are the important ones:

Witnessing chaos theory in action is the same process as before: clear it down to a single solution (C then a), then add a bunch of randomized clones (c).

There is no Canvas renderer for this one. It’s pure WebGL. The tails are drawn using GL_LINE_STRIP, but in this case it works fine that they’re a single pixel wide. If heads are turned on, those are just GL_POINT. The geometry is threadbare for this one.

There is one notable feature: The tails are stored exclusively in GPU memory. Only the “head” is stored CPU-side. After it computes the next step, it updates a single spot of the tail with glBufferSubData(), and the VBO is actually a circular buffer. OpenGL doesn’t directly support rendering from circular buffers, but it does have element arrays. An element array is an additional buffer of indices that tells OpenGL what order to use the elements in the other buffers.

Naively would mean for a tail of 4 segments, I need 4 different element arrays, one for each possible rotation:

array 0: 0 1 2 3
array 1: 1 2 3 0
array 2: 2 3 0 1
array 3: 3 0 1 2

With the knowledge that element arrays can start at an offset, and with a little cleverness, you might notice these can all overlap in a single, 7-element array:

0 1 2 3 0 1 2

Array 0 is at offset 0, array 1 is at offset 1, array 2 is at offset 2, and array 3 is at offset 3. The tails in the Lorenz system are drawn using drawElements() with exactly this sort of array.

Like before, I was very careful to produce zero heap allocations in the main loop. The FPS counter generates some garbage in the DOM due to reflow, but this goes away if you hide the help menu (?). This was long enough ago that destructuring assignment wasn’t available, but Lorenz system and rendering it were so simple that using pre-allocated objects worked fine.

Beyond just the programming, I’ve gotten hours of entertainment playing with each of these systems. This was also the first time I’ve used WebGL in over a year, and this project was a reminder of just how working with it is so pleasurable. The specification is superbly written and serves perfectly as its own reference.

Options for Structured Data in Emacs Lisp

Russian translation by ClipArtMag.
Ukrainian translation by Open Source Initiative.

So your Emacs package has grown beyond a dozen or so lines of code, and the data it manages is now structured and heterogeneous. Informal plain old lists, the bread and butter of any lisp, are not longer cutting it. You really need to cleanly abstract this structure, both for your own organizational sake any for anyone reading your code.

With informal lists as structures, you might regularly ask questions like, “Was the ‘name’ slot stored in the third list element, or was it the fourth element?” A plist or alist helps with this problem, but those are better suited for informal, externally-supplied data, not for internal structures with fixed slots. Occasionally someone suggests using hash tables as structures, but Emacs Lisp’s hash tables are much too heavy for this. Hash tables are more appropriate when keys themselves are data.

Defining a data structure from scratch

Imagine a refrigerator package that manages a collection of food in a refrigerator. A food item could be structured as a plain old list, with slots at specific positions.

(defun fridge-item-create (name expiry weight)
  (list name expiry weight))

A function that computes the mean weight of a list of food items might look like this:

(defun fridge-mean-weight (items)
  (if (null items)
      0.0
    (let ((sum 0.0)
          (count 0))
      (dolist (item items (/ sum count))
        (setf count (1+ count)
              sum (+ sum (nth 2 item)))))))

Note the use of (nth 2 item) at the end, used to get the item’s weight. That magic number 2 is easy to mess up. Even worse, if lots of code accesses “weight” this way, then future extensions will be inhibited. Defining some accessor functions solves this problem.

(defsubst fridge-item-name (item)
  (nth 0 item))

(defsubst fridge-item-expiry (item)
  (nth 1 item))

(defsubst fridge-item-weight (item)
  (nth 2 item))

The defsubst defines an inline function, so there’s effectively no additional run-time costs for these accessors compared to a bare nth. Since these only cover getting slots, we should also define some setters using the built-in gv (generalized variable) package.

(require 'gv)

(gv-define-setter fridge-item-name (value item)
  `(setf (nth 0 ,item) ,value))

(gv-define-setter fridge-item-expiry (value item)
  `(setf (nth 1 ,item) ,value))

(gv-define-setter fridge-item-weight (value item)
  `(setf (nth 2 ,item) ,value))

This makes each slot setf-able. Generalized variables are great for simplifying APIs, since otherwise there would need to be an equal number of setter functions (fridge-item-set-name, etc.). With generalized variables, both are at the same entrypoint:

(setf (fridge-item-name item) "Eggs")

There are still two more significant improvements.

  1. As far as Emacs Lisp is concerned, this isn’t a real type. The type-ness of it is just a fiction created by the conventions of the package. It would be easy to make the mistake of passing an arbitrary list to these fridge-item functions, and the mistake wouldn’t be caught so long as that list has at least three items. An common solution is to add a type tag: a symbol at the beginning of the structure that identifies it.

  2. It’s still a linked list, and nth has to walk the list (i.e. O(n)) to retrieve items. It would be much more efficient to use a vector, turning this into an efficient O(1) operation.

Addressing both of these at once:

(defun fridge-item-create (name expiry weight)
  (vector 'fridge-item name expiry weight))

(defsubst fridge-item-p (object)
  (and (vectorp object)
       (= (length object) 4)
       (eq 'fridge-item (aref object 0))))

(defsubst fridge-item-name (item)
  (unless (fridge-item-p item)
    (signal 'wrong-type-argument (list 'fridge-item item)))
  (aref item 1))

(defsubst fridge-item-name--set (item value)
  (unless (fridge-item-p item)
    (signal 'wrong-type-argument (list 'fridge-item item)))
  (setf (aref item 1) value))

(gv-define-setter fridge-item-name (value item)
  `(fridge-item-name--set ,item ,value))

;; And so on for expiry and weight...

As long as fridge-mean-weight uses the fridge-item-weight accessor, it continues to work unmodified across all these changes. But, whew, that’s quite a lot of boilerplate to write and maintain for each data structure in our package! Boilerplate code generation is a perfect candidate for a macro definition. Luckily for us, Emacs already defines a macro to generate all this code: cl-defstruct.

(require 'cl)

(cl-defstruct fridge-item
  name expiry weight)

In Emacs 25 and earlier, this innocent looking definition expands into essentially all the above code. The code it generates is expressed in the most optimal form for its version of Emacs, and it exploits many of the available optimizations by using function declarations such as side-effect-free and error-free. It’s configurable, too, allowing for the exclusion of a type tag (:named) — discarding all the type checks — or using a list rather than a vector as the underlying structure (:type). As a crude form of structural inheritance, it even allows for directly embedding other structures (:include).

Two pitfalls

There a couple pitfalls, though. First, for historical reasons, the macro will define two namespace-unfriendly functions: make-NAME and copy-NAME. I always override these, preferring the -create convention for the constructor, and tossing the copier since it’s either useless or, worse, semantically wrong.

(cl-defstruct (fridge-item (:constructor fridge-item-create)
                           (:copier nil))
  name expiry weight)

If the constructor needs to be more sophisticated than just setting slots, it’s common to define a “private” constructor (double dash in the name) and wrap it with a “public” constructor that has some behavior.

(cl-defstruct (fridge-item (:constructor fridge-item--create)
                           (:copier nil))
  name expiry weight entry-time)

(cl-defun fridge-item-create (&rest args)
  (apply #'fridge-item--create :entry-time (float-time) args))

The other pitfall is related to printing. In Emacs 25 and earlier, types defined by cl-defstruct are still only types by convention. They’re really just vectors as far as Emacs Lisp is concerned. One benefit from this is that printing and reading these structures is “free” because vectors are printable. It’s trivial to serialize cl-defstruct structures out to a file. This is exactly how the Elfeed database works.

The pitfall is that once a structure has been serialized, there’s no more changing the cl-defstruct definition. It’s now a file format definition, so the slots are locked in place. Forever.

Emacs 26 throws a wrench in all this, though it’s worth it in the long run. There’s a new primitive type in Emacs 26 with its own reader syntax: records. This is similar to hash tables becoming first class in the reader in Emacs 23.2. In Emacs 26, cl-defstruct uses records instead of vectors.

;; Emacs 25:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => [cl-struct-fridge-item "Eggs" nil 11.1]

;; Emacs 26:
(fridge-item-create :name "Eggs" :weight 11.1)
;; => #s(fridge-item "Eggs" nil 11.1)

So far slots are still accessed using aref, and all the type checking still happens in Emacs Lisp. The only practical change is the record function is used in place of the vector function when allocating a structure. But it does pave the way for more interesting things in the future.

The major short-term downside is that this breaks printed compatibility across the Emacs 25/26 boundary. The cl-old-struct-compat-mode function can be used for some degree of backwards, but not forwards, compatibility. Emacs 26 can read and use some structures printed by Emacs 25 and earlier, but the reverse will never be true. This issue initially tripped up Emacs’ built-in packages, and when Emacs 26 is released we’ll see more of these issues arise in external packages.

Dynamic dispatch

Prior to Emacs 25, the major built-in package for dynamic dispatch — functions that specialize on the run-time type of their arguments — was EIEIO, though it only supported single dispatch (specializing on a single argument). EIEIO brought much of the Common Lisp Object System (CLOS) to Emacs Lisp, including classes and methods.

Emacs 25 introduced a more sophisticated dynamic dispatch package called cl-generic. It focuses only on dynamic dispatch and supports multiple dispatch, completely replacing the dynamic dispatch portion of EIEIO. Since cl-defstruct does inheritance and cl-generic does dynamic dispatch, there’s not really much left for EIEIO — besides bad ideas like multiple inheritance and method combination.

Without either of these packages, the most direct way to build single dispatch on top of cl-defstruct would be to shove a function in one of the slots. Then the “method” is just a wrapper that call this function.

;; Base "class"

(cl-defstruct greeter
  greeting)

(defun greet (thing)
  (funcall (greeter-greeting thing) thing))

;; Cow "class"

(cl-defstruct (cow (:include greeter)
                   (:constructor cow--create)))

(defun cow-create ()
  (cow--create :greeting (lambda (_) "Moo!")))

;; Bird "class"

(cl-defstruct (bird (:include greeter)
                    (:constructor bird--create)))

(defun bird-create ()
  (bird--create :greeting (lambda (_) "Chirp!")))

;; Usage:

(greet (cow-create))
;; => "Moo!"

(greet (bird-create))
;; => "Chirp!"

Since cl-generic is aware of the types created by cl-defstruct, functions can specialize on them as if they were native types. It’s a lot simpler to let cl-generic do all the hard work. The people reading your code will appreciate it, too:

(require 'cl-generic)

(cl-defgeneric greet (greeter))

(cl-defstruct cow)

(cl-defmethod greet ((_ cow))
  "Moo!")

(cl-defstruct bird)

(cl-defmethod greet ((_ bird))
  "Chirp!")

(greet (make-cow))
;; => "Moo!"

(greet (make-bird))
;; => "Chirp!"

The majority of the time a simple cl-defstruct will fulfill your needs, keeping in mind the gotcha with the constructor and copier names. Its use should feel almost as natural as defining functions.

Inspiration from Data-dependent Rotations

This article is an expanded email I wrote in response to a question from Frank Muller. He asked how I arrived at my solution to a branchless UTF-8 decoder:

I mean, when you started, I’m pretty the initial solution was using branches, right? Then, you’ve applied some techniques to eliminate them.

A bottom-up approach that begins with branches and then proceeds to eliminate them one at a time sounds like a plausible story. However, this story is the inverse of how it actually played out. It began when I noticed a branchless decoder could probably be done, then I put together the pieces one at a time without introducing any branches. But what sparked that initial idea?

The two prior posts reveal my train of thought at the time: a look at the Blowfish cipher and a 64-bit PRNG shootout. My layman’s study of Blowfish was actually part of an examination of a number of different block ciphers. For example, I also read the NSA’s Speck and Simon paper and then implemented the 128/128 variant of Speck — a 128-bit key and 128-bit block. I didn’t take the time to write an article about it, but note how the entire cipher — key schedule, encryption, and decryption — is just 40 lines of code:

struct speck {
    uint64_t k[32];
};

void
speck_init(struct speck *ctx, uint64_t x, uint64_t y)
{
    ctx->k[0] = y;
    for (uint64_t i = 0; i < 31; i++) {
        x = (x >> 8) | (x << 56);
        x += y;
        x ^= i;
        y = (y << 3) | (y >> 61);
        y ^= x;
        ctx->k[i + 1] = y;
    }
}

void
speck_encrypt(const struct speck *ctx, uint64_t *x, uint64_t *y)
{
    for (int i = 0; i < 32; i++) {
        *x = (*x >> 8) | (*x << 56);
        *x += *y;
        *x ^= ctx->k[i];
        *y = (*y << 3) | (*y >> 61);
        *y ^= *x;
    }
}

static void
speck_decrypt(const struct speck *ctx, uint64_t *x, uint64_t *y)
{
    for (int i = 0; i < 32; i++) {
        *y ^= *x;
        *y = (*y >> 3) | (*y << 61);
        *x ^= ctx->k[31 - i];
        *x -= *y;
        *x = (*x << 8) | (*x >> 56);
    }
}

Isn’t that just beautiful? It’s so tiny and fast. Other than the not-very-arbitrary selection of 32 rounds, and the use of 3-bit and 8-bit rotations, there are no magic values. One could fairly reasonably commit this cipher to memory if necessary, similar to the late RC4. Speck is probably my favorite block cipher right now, except that I couldn’t figure out the key schedule for any of the other key/block size variants.

Another cipher I studied, though in less depth, was RC5 (1994), a block cipher by (obviously) Ron Rivest. The most novel part of RC5 is its use of data dependent rotations. This was a very deliberate decision, and the paper makes this clear:

RC5 should highlight the use of data-dependent rotations, and encourage the assessment of the cryptographic strength data-dependent rotations can provide.

What’s a data-dependent rotation. In the Speck cipher shown above, notice how the right-hand side of all the rotation operations is a constant (3, 8, 56, and 61). Suppose that these operands were not constant, instead they were based on some part of the value of the block:

    int r = *y & 0x0f;
    *x = (*x >> r) | (*x << (64 - r));

Such “random” rotations “frustrate differential cryptanalysis” according to the paper, increasing the strength of the cipher.

Another algorithm that uses data-dependent shift is the PCG family of PRNGs. Honestly, the data-dependent “permutation” shift is the defining characteristic of PCG. As a reminder, here’s the simplified PCG from my shootout:

uint32_t
spcg32(uint64_t s[1])
{
    uint64_t m = 0x9b60933458e17d7d;
    uint64_t a = 0xd737232eeccdf7ed;
    *s = *s * m + a;
    int shift = 29 - (*s >> 61);
    return *s >> shift;
}

Notice how the final shift depends on the high order bits of the PRNG state. (This one weird trick from Melissa O’Neil will significantly improve your PRNG. Xorshift experts hate her.)

I think this raises a really interesting question: Why did it take until 2014 for someone to apply a data-dependent shift to a PRNG? Similarly, why are data-dependent rotations not used in many ciphers?

My own theory is that this is because many older instruction set architectures can’t perform data-dependent shift operations efficiently.

Many instruction sets only have a fixed shift (e.g. 1-bit), or can only shift by an immediate (e.g. a constant). In these cases, a data-dependent shift would require a loop. These loops would be a ripe source of side channel attacks in ciphers due to the difficultly of making them operate in constant time. It would also be relatively slow for video game PRNGs, which often needed to run on constrained hardware with limited instruction sets. For example, the 6502 (Atari, Apple II, NES, Commodore 64) and the Z80 (too many to list) can only shift/rotate one bit at a time.

Even on an architecture with an instruction for data-dependent shifts, such as the x86, those shifts will be slower than constant shifts, at least in part due to the additional data dependency.

It turns out there are also some patent issues (ex. 1, 2). Fortunately most of these patents have now expired, and one in particular is set to expire this June. I still like my theory better.

To branchless decoding

So I was thinking about data-dependent shifts, and I had also noticed I could trivially check the length of a UTF-8 code point using a small lookup table — the first step in my decoder. What if I combined these: a data-dependent shift based on a table lookup. This would become the last step in my decoder. The idea for a branchless UTF-8 decoder was essentially borne out of connecting these two thoughts, and then filling in the middle.

Debugging Emacs or: How I Learned to Stop Worrying and Love DTrace

Update: This article was featured on BSD Now 233 (starting at 21:38).

For some time Elfeed was experiencing a strange, spurious failure. Every so often users were seeing an error (spoiler warning) when updating feeds: “error in process sentinel: Search failed.” If you use Elfeed, you might have even seen this yourself. From the surface it appeared that curl, tasked with the responsibility for downloading feed data, was producing incomplete output despite reporting a successful run. Since the run was successful, Elfeed assumed certain data was in curl’s output buffer, but, since it wasn’t, it failed hard.

Unfortunately this issue was not reproducible. Manually running curl outside of Emacs never revealed any issues. Asking Elfeed to retry fetching the feeds would work fine. The issue would only randomly rear its head when Elfeed was fetching many feeds in parallel, under stress. By the time the error was discovered, the curl process had exited and vital debugging information was lost. Considering that this was likely to be a bug in Emacs itself, there really wasn’t a reliable way to capture the necessary debugging information from within Emacs Lisp. And, indeed, this later proved to be the case.

A quick-and-dirty work around is to use condition-case to catch and swallow the error. When the bizarre issue shows up, rather than fail badly in front of the user, Elfeed could attempt to swallow the error — assuming it can be reliably detected — and treat the fetch as simply a failure. That didn’t sit comfortably with me. Elfeed had done its due diligence checking for errors already. Someone was lying to Elfeed, and I intended to catch them with their pants on fire. Someday.

I’d just need to witness the bug on one of my own machines. Elfeed is part of my daily routine, so surely I’d have to experience this issue myself someday. My plan was, should that day come, to run a modified Elfeed, instrumented to capture extra data. I would have also routinely run Emacs under GDB so that I could inspect the failure more deeply.

For now I just had to wait to hunt that zebra.

Bryan Cantrill, DTrace, and FreeBSD

Over the holidays I re-discovered Bryan Cantrill, a systems software engineer who worked for Sun between 1996 and 2010, and is most well known for DTrace. My first exposure to him was in a BSD Now interview in 2015. I had re-watched that interview and decided there was a lot more I had to learn from him. He’s become a personal hero to me. So I scoured the internet for more of his writing and talks. Besides what I’ve already linked in this article, here are a couple more great presentations:

You can also find some of his writing scattered around the DTrace blog.

Some interesting operating system technology came out of Sun during its final 15 or so years — most notably DTrace and ZFS — and Bryan speaks about it passionately. Almost as a matter of luck, most of it survived the Oracle acquisition thanks to Sun releasing it as open source in just the nick of time. Otherwise it would have been lost forever. The scattered ex-Sun employees, still passionate about their prior work at Sun, along with some of their old customers have since picked up the pieces and kept going as a community under the name illumos. It’s like an open source flotilla.

Naturally I wanted to get my hands on this stuff to try it out for myself. Is it really as good as they say? Normally I stick to Linux, but it (generally) doesn’t have these Sun technologies. The main reason is license incompatibility. Sun released its code under the CDDL, which is incompatible with the GPL. Ubuntu does infamously include ZFS, but other distributions are unwilling to take that risk. Porting DTrace is a serious undertaking since it’s got its fingers throughout the kernel, which also makes the licensing issues even more complicated.

(Update Feburary 2018: DTrace has been released under the GPLv2, allowing it to be legally integrated with Linux.)

Linux has a reputation for Not Invented Here (NIH) syndrome, and these licensing issues certainly contribute to that. Rather than adopt ZFS and DTrace, they’ve been reinvented from scratch: btrfs instead of ZFS, and a slew of partial options instead of DTrace. Normally I’m most interested in system call tracing, and my go to is strace, though it certainly has its limitations — including this situation of debugging curl under Emacs. Another famous example of NIH is Linux’s epoll(2), which is a broken version of BSD kqueue(2).

So, if I want to try these for myself, I’ll need to install a different operating system. I’ve dabbled with OmniOS, an OS built on illumos, in virtual machines, using it as an alien environment to test some of my software (e.g. enchive). OmniOS has a philosophy called Keep Your Software To Yourself (KYSTY), which is really just code for “we don’t do packaging.” Honestly, you can’t blame them since they’re a tiny community. The best solution to this is probably pkgsrc, which is essentially a universal packaging system. Otherwise you’re on your own.

There’s also openindiana, which is a more friendly desktop-oriented illumos distribution. Still, the short of it is that you’re very much on your own when things don’t work. The situation is like running Linux a couple decades ago, when it was still difficult to do.

If you’re interested in trying DTrace, the easiest option these days is probably FreeBSD. It’s got a big, active community, thorough documentation, and a huge selection of packages. Its license (the BSD license, duh) is compatible with the CDDL, so both ZFS and DTrace have been ported to FreeBSD.

What is DTrace?

I’ve done all this talking but haven’t yet described what DTrace really is. I won’t pretend to write my own tutorial, but I’ll provide enough information to follow along. DTrace is a tracing framework for debugging production systems in real time, both for the kernel and for applications. The “production systems” part means it’s stable and safe — using DTrace won’t put your system at risk of crashing or damaging data. The “real time” part means it has little impact on performance. You can use DTrace on live, active systems with little impact. Both of these core design principles are vital for troubleshooting those really tricky bugs that only show up in production.

There are DTrace probes scattered all throughout the system: on system calls, scheduler events, networking events, process events, signals, virtual memory events, etc. Using a specialized language called D (unrelated to the general purpose programming language D), you can dynamically add behavior at these instrumentation points. Generally the behavior is to capture information, but it can also manipulate the event being traced.

Each probe is fully identified by a 4-tuple delimited by colons: provider, module, function, and probe name. An empty element denotes a sort of wildcard. For example, syscall::open:entry is a probe at the beginning (i.e. “entry”) of open(2). syscall:::entry matches all system call entry probes.

Unlike strace on Linux which monitors a specific process, DTrace applies to the entire system when active. To run curl under strace from Emacs, I’d have to modify Emacs’ behavior to do so. With DTrace I can instrument every curl process without making a single change to Emacs, and with negligible impact to Emacs. That’s a big deal.

So, when it comes to this Elfeed issue, FreeBSD is much better poised for debugging the problem. All I have to do is catch it in the act. However, it’s been months since that bug report and I’m not really making this connection yet. I’m just hoping I eventually find an interesting problem where I can apply DTrace.

FreeBSD on a Raspberry Pi 2

So I’ve settled in FreeBSD as the playground for these technologies, I just have to decide where. I could always run it in a virtual machine, but it’s always more interesting to try things out on real hardware. FreeBSD supports the Raspberry Pi 2 as a Tier 2 system, and I had a Raspberry Pi 2 sitting around collecting dust, so I put it to use.

I wrote the image to an SD card, and for a few days I stretched my legs on this new system. I cloned a couple dozen of my own git repositories, ran the builds and the tests, and just got a feel for things. I tried out the ports system for the first time, mainly to discover that the low-powered Raspberry Pi 2 takes days to build some of the packages I want to try.

I mostly program in Vim these days, so it’s some days before I even set up Emacs. Eventually I do build Emacs, clone my configuration, fire it up, and give Elfeed a spin.

And that’s when the “search failed” bug strikes! Not just once, but dozens of times. Perfect! This low-powered platform is the jackpot for this particular bug, triggering it left and right. Given that I’ve got DTrace at my disposal, it’s the perfect place to debug this. Something is lying to Elfeed and DTrace will play the judge.

Before I dive in I see three possibilities:

  1. curl is reporting success but truncating its output.
  2. Emacs is quietly truncating curl’s output.
  3. Emacs is misinterpreting curl’s exit status.

With Dtrace I can observe what every curl process writes to Emacs, and I can also double check curl’s exit status. I come up with the following (newbie) DTrace script:

syscall::write:entry
/execname == "curl"/
{
    printf("%d WRITE %d \"%s\"\n",
           pid, arg2, stringof(copyin(arg1, arg2)));
}

syscall::exit:entry
/execname == "curl"/
{
    printf("%d EXIT  %d\n", pid, arg0);
}

The /execname == "curl"/ is a predicate that (obviously) causes the behavior to only fire for curl processes. The first probe has DTrace print a line for every write(2) from curl. arg0, arg1, and arg2 correspond to the arguments of write(2): fd, buf, count. It logs the process ID (pid) of the write, the length of the write, and the actual contents written. Remember that these curl processes are run in parallel by Emacs, so the pid allows me to associate the separate writes and the exit status.

The second probe prints the pid and the exit status (the first argument to exit(2)).

I also want to compare this to exactly what is delivered to Elfeed when curl exits, so I modify the process sentinel — the callback that handles a subprocess exiting — to call write-file before any action is taken. I can compare these buffer dumps to the logs produced by DTrace.

There are two important findings.

First, when the “search failed” bug occurs, the buffer was completely empty (95% of the time) or truncated at the end of the HTTP headers (5% of the time), right at the blank line. DTrace indicates that curl did its job to the full, so it’s Emacs who’s the liar. It’s not delivering all of curl’s data to Elfeed. That’s pretty annoying.

Second, curl was line-buffered. Each line was a separate, independent write(2). I was certainly not expecting this. Normally the C library only does line buffering when the output is a terminal. That’s because it’s guessing a user may be watching, expecting the output to arrive a line at a time.

Here’s a sample of what it looked like in the log:

88188 WRITE 32 "Server: Apache/2.4.18 (Ubuntu)
"
88188 WRITE 46 "Location: https://blog.plover.com/index.atom
"
88188 WRITE 21 "Content-Length: 299
"
88188 WRITE 45 "Content-Type: text/html; charset=iso-8859-1
"
88188 WRITE 2 "
"

Why would curl think Emacs is a terminal?

Oh. That’s right. This is the same problem I ran into four years ago when writing EmacSQL. By default Emacs connects to subprocesses through a psuedo-terminal (pty). I called this a mistake in Emacs back then, and I still stand by that claim. The pty causes weird, annoying problems for little benefit:

Just from eyeballing the DTrace log I knew what to do: dump the pty and switch to a pipe. This is controlled with the process-connection-type variable, and fixing it is a one-liner.

Not only did this completely resolve the truncation issue, Elfeed is noticeably faster at fetching feeds on all machines. It’s no longer receiving mountains of XML one line at a time, like sucking pudding through a straw. It’s now quite zippy even on my Raspberry Pi 2, which had never been the case before (without the “search failed” bug). Even if you were never affected by this bug, you will benefit from the fix.

I haven’t officially reported this as an Emacs bug yet because reproducibility is still an issue. It needs something better than “fire off a bunch of HTTP requests across the internet in parallel from a Raspberry Pi.”

The fix reminds me of that old boilermaker story about charging a lot of money just to swing a hammer. Once the problem arose, DTrace quickly helped to identify the place to hit Emacs with the hammer.

Finally, a big thanks to alphapapa for originally taking the time to report this bug months ago.

What's in an Emacs Lambda

There was recently some interesting discussion about correctly using backquotes to express a mixture of data and code. Since lambda expressions seem to evaluate to themselves, what’s the difference? For example, an association list of operations:

'((add . (lambda (a b) (+ a b)))
  (sub . (lambda (a b) (- a b)))
  (mul . (lambda (a b) (* a b)))
  (div . (lambda (a b) (/ a b))))

It looks like it would work, and indeed it does work in this case. However, there are good reasons to actually evaluate those lambda expressions. Eventually invoking the lambda expressions in the quoted form above are equivalent to using eval. So, instead, prefer the backquote form:

`((add . ,(lambda (a b) (+ a b)))
  (sub . ,(lambda (a b) (- a b)))
  (mul . ,(lambda (a b) (* a b)))
  (div . ,(lambda (a b) (/ a b))))

There are a lot of interesting things to say about this, but let’s first reduce it to two very simple cases:

(lambda (x) x)

'(lambda (x) x)

What’s the difference between these two forms? The first is a lambda expression, and it evaluates to a function object. The other is a quoted list that looks like a lambda expression, and it evaluates to a list — a piece of data.

A naive evaluation of these expressions in *scratch* (C-x C-e) suggests they are are identical, and so it would seem that quoting a lambda expression doesn’t really matter:

(lambda (x) x)
;; => (lambda (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

However, there are two common situations where this is not the case: byte compilation and lexical scope.

Lambda under byte compilation

It’s a little trickier to evaluate these forms byte compiled in the scratch buffer since that doesn’t happen automatically. But if it did, it would look like this:

;;; -*- lexical-binding: nil; -*-

(lambda (x) x)
;; => #[(x) "\010\207" [x] 1]

'(lambda (x) x)
;; => (lambda (x) x)

The #[...] is the syntax for a byte-code function object. As discussed in detail in my byte-code internals article, it’s a special vector object that contains byte-code, and other metadata, for evaluation by Emacs’ virtual stack machine. Elisp is one of very few languages with readable function objects, and this feature is core to its ahead-of-time byte compilation.

The quote, by definition, prevents evaluation, and so inhibits byte compilation of the lambda expression. It’s vital that the byte compiler does not try to guess the programmer’s intent and compile the expression anyway, since that would interfere with lists that just so happen to look like lambda expressions — i.e. any list containing the lambda symbol.

There are three reasons you want your lambda expressions to get byte compiled:

While it’s common for personal configurations to skip byte compilation, Elisp should still generally be written as if it were going to be byte compiled. General rule of thumb: Ensure your lambda expressions are actually evaluated.

Lambda in lexical scope

As I’ve stressed many times, you should always use lexical scope. There’s no practical disadvantage or trade-off involved. Just do it.

Once lexical scope is enabled, the two expressions diverge even without byte compilation:

;;; -*- lexical-binding: t; -*-

(lambda (x) x)
;; => (closure (t) (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

Under lexical scope, lambda expressions evaluate to closures. Closures capture their lexical environment in their closure object — nothing in this particular case. It’s a type of function object, making it a valid first argument to funcall.

Since the quote prevents the second expression from being evaluated, semantically it evaluates to a list that just so happens to look like a (non-closure) function object. Invoking a data object as a function is like using eval — i.e. executing data as code. Everyone already knows eval should not be used lightly.

It’s a little more interesting to look at a closure that actually captures a variable, so here’s a definition for constantly, a higher-order function that returns a closure that accepts any number of arguments and returns a particular constant:

(defun constantly (x)
  (lambda (&rest _) x))

Without byte compiling it, here’s an example of its return value:

(constantly :foo)
;; => (closure ((x . :foo) t) (&rest _) x)

The environment has been captured as an association list (with a trailing t), and we can plainly see that the variable x is bound to the symbol :foo in this closure. Consider that we could manipulate this data structure (e.g. setcdr or setf) to change the binding of x for this closure. This is essentially how closures mutate their own environment. Moreover, closures from the same environment share structure, so such mutations are also shared. More on this later.

Semantically, closures are distinct objects (via eq), even if the variables they close over are bound to the same value. This is because they each have a distinct environment attached to them, even if in some invisible way.

(eq (constantly :foo) (constantly :foo))
;; => nil

Without byte compilation, this is true even when there’s no lexical environment to capture:

(defun dummy ()
  (lambda () t))

(eq (dummy) (dummy))
;; => nil

The byte compiler is smart, though. As an optimization, the same closure object is reused when possible, avoiding unnecessary work, including multiple object allocations. Though this is a bit of an abstraction leak. A function can (ab)use this to introspect whether it’s been byte compiled:

(defun have-i-been-compiled-p ()
  (let ((funcs (vector nil nil)))
    (dotimes (i 2)
      (setf (aref funcs i) (lambda ())))
    (eq (aref funcs 0) (aref funcs 1))))

(have-i-been-compiled-p)
;; => nil

(byte-compile 'have-i-been-compiled-p)

(have-i-been-compiled-p)
;; => t

The trick here is to evaluate the exact same non-capturing lambda expression twice, which requires a loop (or at least some sort of branch). Semantically we should think of these closures as being distinct objects, but, if we squint our eyes a bit, we can see the effects of the behind-the-scenes optimization.

Don’t actually do this in practice, of course. That’s what byte-code-function-p is for, which won’t rely on a subtle implementation detail.

Overcapturing

I mentioned before that one of the potential gotchas of not byte compiling your lambda expressions is overcapturing closure variables in the interpreter.

To evaluate lisp code, Emacs has both an interpreter and a virtual machine. The interpreter evaluates code in list form: cons cells, numbers, symbols, etc. The byte compiler is like the interpreter, but instead of directly executing those forms, it emits byte-code that, when evaluated by the virtual machine, produces identical visible results to the interpreter — in theory.

What this means is that Emacs contains two different implementations of Emacs Lisp, one in the interpreter and one in the byte compiler. The Emacs developers have been maintaining and expanding these implementations side-by-side for decades. A pitfall to this approach is that the implementations can, and do, diverge in their behavior. We saw this above with that introspective function, and it comes up in practice with advice.

Another way they diverge is in closure variable capture. For example:

;;; -*- lexical-binding: t; -*-

(defun overcapture (x y)
  (when y
    (lambda () x)))

(overcapture :x :some-big-value)
;; => (closure ((y . :some-big-value) (x . :x) t) nil x)

Notice that the closure captured y even though it’s unnecessary. This is because the interpreter doesn’t, and shouldn’t, take the time to analyze the body of the lambda to determine which variables should be captured. That would need to happen at run-time each time the lambda is evaluated, which would make the interpreter much slower. Overcapturing can get pretty messy if macros are introducing their own hidden variables.

On the other hand, the byte compiler can do this analysis just once at compile-time. And it’s already doing the analysis as part of its job. It can avoid this problem easily:

(overcapture :x :some-big-value)
;; => #[0 "\300\207" [:x] 1]

It’s clear that :some-big-value isn’t present in the closure.

But… how does this work?

How byte compiled closures are constructed

Recall from the internals article that the four core elements of a byte-code function object are:

  1. Parameter specification
  2. Byte-code string (opcodes)
  3. Constants vector
  4. Maximum stack usage

While a closure seems like compiling a whole new function each time the lambda expression is evaluated, there’s actually not that much to it! Namely, the behavior of the function remains the same. Only the closed-over environment changes.

What this means is that closures produced by a common lambda expression can all share the same byte-code string (second element). Their bodies are identical, so they compile to the same byte-code. Where they differ are in their constants vector (third element), which gets filled out according to the closed over environment. It’s clear just from examining the outputs:

(constantly :a)
;; => #[128 "\300\207" [:a] 2]

(constantly :b)
;; => #[128 "\300\207" [:b] 2]

constantly has three of the four components of the closure in its own constant pool. Its job is to construct the constants vector, and then assemble the whole thing into a byte-code function object (#[...]). Here it is with M-x disassemble:

0       constant  make-byte-code
1       constant  128
2       constant  "\300\207"
4       constant  vector
5       stack-ref 4
6       call      1
7       constant  2
8       call      4
9       return

(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve simplified it for this discussion.)

It pushes most of its constants on the stack. Then the stack-ref 5 (5) puts x on the stack. Then it calls vector to create the constants vector (6). Finally, it constructs the function object (#[...]) by calling make-byte-code (8).

Since this might be clearer, here’s the same thing expressed back in terms of Elisp:

(defun constantly (x)
  (make-byte-code 128 "\300\207" (vector x) 2))

To see the disassembly of the closure’s byte-code:

(disassemble (constantly :x))

The result isn’t very surprising:

0       constant  :x
1       return

Things get a little more interesting when mutation is involved. Consider this adder closure generator, which mutates its environment every time it’s called:

(defun adder ()
  (let ((total 0))
    (lambda () (cl-incf total))))

(let ((count (adder)))
  (funcall count)
  (funcall count)
  (funcall count))
;; => 3

(adder)
;; => #[0 "\300\211\242T\240\207" [(0)] 2]

The adder essentially works like this:

(defun adder ()
  (make-byte-code 0 "\300\211\242T\240\207" (vector (list 0)) 2))

In theory, this closure could operate by mutating its constants vector directly. But that wouldn’t be much of a constants vector, now would it!? Instead, mutated variables are boxed inside a cons cell. Closures don’t share constant vectors, so the main reason for boxing is to share variables between closures from the same environment. That is, they have the same cons in each of their constant vectors.

There’s no equivalent Elisp for the closure in adder, so here’s the disassembly:

0       constant  (0)
1       dup
2       car-safe
3       add1
4       setcar
5       return

It puts two references to boxed integer on the stack (constant, dup), unboxes the top one (car-safe), increments that unboxed integer, stores it back in the box (setcar) via the bottom reference, leaving the incremented value behind to be returned.

This all gets a little more interesting when closures interact:

(defun fancy-adder ()
  (let ((total 0))
    `(:add ,(lambda () (cl-incf total))
      :set ,(lambda (v) (setf total v))
      :get ,(lambda () total))))

(let ((counter (fancy-adder)))
  (funcall (plist-get counter :set) 100)
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :get)))
;; => 102

(fancy-adder)
;; => (:add #[0 "\300\211\242T\240\207" [(0)] 2]
;;     :set #[257 "\300\001\240\207" [(0)] 3]
;;     :get #[0 "\300\242\207" [(0)] 1])

This is starting to resemble object oriented programming, with methods acting upon fields stored in a common, closed-over environment.

All three closures share a common variable, total. Since I didn’t use print-circle, this isn’t obvious from the last result, but each of those (0) conses are the same object. When one closure mutates the box, they all see the change. Here’s essentially how fancy-adder is transformed by the byte compiler:

(defun fancy-adder ()
  (let ((box (list 0)))
    (list :add (make-byte-code 0 "\300\211\242T\240\207" (vector box) 2)
          :set (make-byte-code 257 "\300\001\240\207" (vector box) 3)
          :get (make-byte-code 0 "\300\242\207" (vector box) 1))))

The backquote in the original fancy-adder brings this article full circle. This final example wouldn’t work correctly if those lambdas weren’t evaluated properly.

null program

Chris Wellons