Articles tagged lang at null program

Unintuitive JSON Parsing

2019-12-28T17:23:09Z

This article was discussed on Hacker News and on reddit.

Despite the goal of JSON being a subset of JavaScript — which it failed to achieve (update: this was fixed) — parsing JSON is quite unlike parsing a programming language. For invalid inputs, the specific cause of error is often counter-intuitive. Normally this doesn’t matter, but I recently ran into a case where it does.

Consider this invalid input to a JSON parser:

[01]

To a human this might be interpreted as an array containing a number. Either the leading zero is ignored, or it indicates octal, as it does in many languages, including JavaScript. In either case the number in the array would be 1.

However, JSON does not support leading zeros, neither ignoring them nor supporting octal notation. Here’s the railroad diagram for numbers from the JSON specficaiton:

Or in regular expression form:

-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?

If a token starts with 0 then it can only be followed by ., e, or E. It cannot be followed by a digit. So, the natural human response to mentally parsing [01] is: This input is invalid because it contains a number with a leading zero, and leading zeros are not accepted. But this is not actually why parsing fails!

A simple model for the parser is as consuming tokens from a lexer. The lexer’s job is to read individual code points (characters) from the input and group them into tokens. The possible tokens are string, number, left brace, right brace, left bracket, right bracket, comma, true, false, and null. The lexer skips over insignificant whitespace, and it doesn’t care about structure, like matching braces and brackets. That’s the parser’s job.

In some instances the lexer can fail to parse a token. For example, if while looking for a new token the lexer reads the character %, then the input must be invalid. No token starts with this character. So in some cases invalid input will be detected by the lexer.

The parser consumes tokens from the lexer and, using some state, ensures the sequence of tokens is valid. For example, arrays must be a well formed sequence of left bracket, value, comma, value, comma, etc., right bracket. One way to reject input with trailing garbage, is for the lexer to also produce an EOF (end of file/input) token when there are no more tokens, and the parser could specifically check for that token before accepting the input as valid.

Getting back to the input [01], a JSON parser receives a left bracket token, then updates its bookkeeping to track that it’s parsing an array. When looking for the next token, the lexer sees the character 0 followed by 1. According to the railroad diagram, this is a number token (starts with 0), but 1 cannot be part of this token, so it produces a number token with the contents “0”. Everything is still fine.

Next the lexer sees 1 followed by ]. Since ] cannot be part of a number, it produces another number token with the contents “1”. The parser receives this token but, since it’s parsing an array, it expects either a comma token or a right bracket. Since this is neither, the parser fails with an error about an unexpected number. The parser will not complain about leading zeros because JSON has no concept of leading zeros. Human intuition is right, but for the wrong reasons.

Try this for yourself in your favorite JSON parser. Or even just pop up the JavaScript console in your browser and try it out:

JSON.parse('[01]');

Firefox reports:

SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element

Chromium reports:

SyntaxError: Unexpected number in JSON

Edge reports (note it says “number” not “digit”):

Error: Invalid number at position:3

In all cases the parsers accepted a zero as the first array element, then rejected the input after the second number token for being a bad sequence of tokens. In other words, this is a parser error rather than a lexer error, as a human might intuit.

My JSON parser comes with a testing tool that shows the token stream up until the parser rejects the input, useful for understanding these situations:

$ echo '[01]' | tests/stream
struct expect seq[] = {
    {JSON_ARRAY},
    {JSON_NUMBER, "0"},
    {JSON_ERROR},
};

There’s an argument to be made here that perhaps the human readable error message should mention leading zeros, since that’s likely the cause of the invalid input. That is, a human probably thought JSON allowed leading zeros, and so the clearer message would tell the human that JSON does not allow leading zeros. This is the “more art than science” part of parsing.

It’s the same story with this invalid input:

[truefalse]

From this input, the lexer unambiguously produces left bracket, true, false, right bracket. It’s still up to the parser to reject this input. The only reason we never see truefalse in valid JSON is that the overall structure never allows these tokens to be adjacent, not because they’d be ambiguous. Programming languages have identifiers, and in a programming language this would parse as the identifier truefalse rather than true followed by false. From this point of view, JSON seems quite strange.

Just as before, Firefox reports:

SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element

Chromium reports the same error as it does for [true false]:

SyntaxError: Unexpected token f in JSON

Edge’s message is probably a minor bug in their JSON parser:

Error: Expected ‘]’ at position:10

Position 10 is the last character in false. The lexer consumed false from the input, produced a “false” token, then the parser rejected the input. When it reported the error, it chose the end of the invalid token as the error position rather than the start, despite the fact that the only two valid tokens (comma, right bracket) are both a single character. It should also say “Expected ‘]’ or ‘,’” (as Firefox does) rather than just “]”.

Concatenated JSON

That’s all pretty academic. Except for producing nice error messages, nobody really cares so much why the input was rejected. The mismatch between intuition and reality isn’t important.

However, it does come up with concatenated JSON. Some parsers, including mine, will optionally consume multiple JSON values, one after another, from the same input. Here’s an example from one of my favorite command line tools, jq:

echo '{"x":0,"y":1}{"x":2,"y":3}{"x":4,"y":5}' | jq '.x + .y'
1
5
9

The input contains three unambiguously-concatenated JSON objects, so the parser produces three distinct objects. Now consider this input, this time outside of the context of an array:

Is this invalid, one number, or two numbers? According to the lexer and parser model described above, this is valid and unambiguously two concatenated numbers. Here’s what my parser says:

$ echo '01' | tests/stream
struct expect seq[] = {
    {JSON_NUMBER, "0"},
    {JSON_DONE},
    {JSON_NUMBER, "1"},
    {JSON_DONE},
    {JSON_ERROR},
};

Note: The JSON_DONE “token” indicates acceptance, and the JSON_ERROR token is an EOF indicator, not a hard error. Since jq allows leading zeros in its JSON input, it’s ambiguous and parses this as the number 1, so asking its opinion on this input isn’t so interesting. I surveyed some other JSON parsers that accept concatenated JSON:

Jackson: Reject as leading zero.
Noggit: Reject as leading zero.
yajl: Accept as two numbers.

For my parser it’s the same story for truefalse:

echo 'truefalse' | tests/stream
struct expect seq[] = {
    {JSON_TRUE, "true"},
    {JSON_DONE},
    {JSON_FALSE, "false"},
    {JSON_DONE},
    {JSON_ERROR},
};

Neither rejecting nor accepting this input is wrong, per se. Concatenated JSON is outside of the scope of JSON itself, and concatenating arbitrary JSON objects without a whitespace delimiter can lead to weird and ill-formed input. This is all a great argument in favor or Newline Delimited JSON, and its two simple rules:

Line separator is '\n'
Each line is a valid JSON value

This solves the concatenation issue, and, even more, it works well with parsers not supporting concatenation: Split the input on newlines and pass each line to your JSON parser.

No, PHP Doesn't Have Closures

2019-09-25T21:10:43Z

The PHP programming language is bizarre and, if nothing else, worthy of anthropological study. The only consistent property of PHP is how badly it’s designed, yet it somehow remains widely popular. There’s a social dynamic at play here that science has yet to unlock.

I don’t say this because I hate PHP. There’s no reason for that: I don’t write programs in PHP, never had to use it, and don’t expect to ever need it. Despite this, I just can’t look away from PHP in the same way I can’t look away from a car accident.

I recently came across a link to the PHP manual, and morbid curiosity that caused me to look through it. It’s fun to pick an arbitrary section of the manual and see how many crazy design choices I can spot, or at least see what sort of strange terminology the manual has invented to describe a common concept. This time around, one such section was on anonymous functions, including closures. It was even worse than I expected.

In some circumstances, closures can be a litmus test. Closure semantics are not complex, but they’re subtle and a little tricky until you get hang of them. If you’re interviewing a candidate, toss in a question or two about closures. Either they’re familiar and get it right away, or they’re unfamiliar and get nothing right. The latter is when it’s most informative. PHP itself falls clearly into the latter. Not only that, the example of a “closure” in the manual demonstrates a “closure” closing over a global variable!

I’d been told for years that PHP has closures, and I took that claim at face value. In fact, PHP has had “closures” since 5.3.0, released in June 2009, so I’m over a decade late in investigating it. However, as far as I can tell, nobody’s ever pointed out that PHP “closures” are, in fact, not actually closures.

Anonymous functions and closures

Before getting into why they’re not closures, let’s go over how it works, starting with a plain old anonymous function. PHP does have anonymous functions — the easy part.

function foo() {
    return function() {
        return 1;
    };
}

The function foo returns a function that returns 1. In PHP 7 you can call the returned function immediately like so:

$r = foo()();  // $r = 1

In PHP 5 this is a syntax error because, well, it’s PHP and its parser is about as clunky as Matlab’s.

In a well-designed language, you’d expect that this could also be a closure. That is, it closes over local variables, and the function may continue to access those variables later. For example:

function bar($n) {
    return function() {
        return $n;
    };
}

bar(1)();  // error: Undefined variable: n

This fails because you must explicitly tell PHP what variables you intend to access inside the anonymous function with use:

function bar($n) {
    return function() use ($n) {
        return $n;
    };
}

bar(1)();  // 1

If this actually closed over $n, this would be a legitimate closure. Having to tell the language exactly which variables are being closed over would be pretty dumb, but it still meets the definition of a closure.

But here’s the catch: It’s not actually closing over any variables. The names listed in use are actually extra, hidden parameters bound to the current value of those variables. In other words, this is nothing more than partial function evaluation.

function bar($n) {
    $f = function() use ($n) {
        return $n;
    };
    $n++;  // never used!
    return $f;
}

$r = bar(1)();  // $r = 1

Here’s the equivalent in JavaScript using the bind() method:

function bar(n) {
    let f = function(m) {
        return m;
    };
    return f.bind(null, n);
}

This is actually more powerful than PHP’s “closures” since any arbitrary expression can be used for the bound argument. In PHP it’s limited to a couple of specific forms. If JavaScript didn’t have proper closures, and instead we all had to rely on bind(), nobody would claim that JavaScript had closures. It shouldn’t be different for PHP.

References

PHP does have references, and binding a reference to an anonymous function is kinda, sorta like a closure. But that’s still just partial function evaluation, but where that argument is a reference.

Here’s how to tell these reference captures aren’t actually closures: They work equally well for global variables as local variables. So it’s still not closing over a lexical environment, just binding a reference to a parameter.

$counter = 0;

function bar($n) {
    global $counter;
    $f = function() use (&$n, &$counter) {
        $counter++;
        return $n;
    };
    $n++;  // now has an effect
    return $f;
}

$r = bar(1)();  // $r = 2, $counter = 1

In the example above, there’s no difference between $n, a local variable, and $counter, a global variable. It wouldn’t make sense for a closure to close over a global variable.

Emacs Lisp partial function application

Emacs Lisp famously didn’t get lexical scope, and therefore closures, until fairly recently. It was — and still is by default — a dynamic scope oddball. However, it’s long had an apply-partially function for partial function application. It returns a closure-like object, and did so when the language didn’t have proper closures. So it can be used to create a “closure” just like PHP:

(defun bar (n)
  (apply-partially (lambda (m) m) n))

This works regardless of lexical or dynamic scope, which is because this construct isn’t really a closure, just like PHP’s isn’t a closure. In PHP, its partial function evaluation is built directly into the language with special use syntax.

Monkey see, monkey do

Why does the shell command language use sigils? Because it’s built atop interactive command line usage, where bare words are taken literally and variables are the exception. Why does Perl use sigils? Because it was originally designed as an alternative to shell scripts, so it mimicked that syntax. Why does PHP use sigils? Because Perl did.

The situation with closures follows that pattern, and it comes up all over PHP. Its designers see a feature in another language, but don’t really understand its purpose or semantics. So when they attempt to add that feature to PHP, they get it disastrously wrong.

UTF-8 String Indexing Strategies

2019-05-29T21:52:06Z

This article was discussed on Hacker News.

When designing or, in some cases, implementing a programming language with built-in support for Unicode strings, an important decision must be made about how to represent or encode those strings in memory. Not all representations are equal, and there are trade-offs between different choices.

One issue to consider is that strings typically feature random access indexing of code points with a time complexity resembling constant time (O(1)). However, not all string representations actually support this well. Strings using variable length encoding, such as UTF-8 or UTF-16, have O(n) time complexity indexing, ignoring special cases (discussed below). The most obvious choice to achieve O(1) time complexity — an array of 32-bit values, as in UCS-4 — makes very inefficient use of memory, especially with typical strings.

Despite this, UTF-8 is still chosen in a number of programming languages, or at least in their implementations. In this article I’ll discuss three examples — Emacs Lisp, Julia, and Go — and how each takes a slightly different approach.

Emacs Lisp

Emacs Lisp has two different types of strings that generally can be used interchangeably: unibyte and multibyte. In fact, the difference between them is so subtle that I bet that most people writing Emacs Lisp don’t even realize there are two kinds of strings.

Emacs Lisp uses UTF-8 internally to encode all “multibyte” strings and buffers. To fully support arbitrary sequences of bytes in the files being edited, Emacs uses its own extension of Unicode to precisely and unambiguously represent raw bytes intermixed with text. Any arbitrary sequence of bytes can be decoded into Emacs’ internal representation, then losslessly re-encoded back into the exact same sequence of bytes.

Unibyte strings and buffers are really just byte-strings. In practice, they’re essentially ISO/IEC 8859-1, a.k.a. Latin-1. It’s a Unicode string where all code points are below 256. Emacs prefers the smallest and simplest string representation when possible, similar to CPython 3.3+.

(multibyte-string-p "hello")
;; => nil

(multibyte-string-p "π ≈ 3.14")
;; => t

Emacs Lisp strings are mutable, and therein lies the kicker: As soon as you insert a code point above 255, Emacs quietly converts the string to multibyte.

(defvar fish "fish")

(multibyte-string-p fish)
;; => nil

(setf (aref fish 2) ?ŝ
      (aref fish 3) ?o)

fish
;; => "fiŝo"

(multibyte-string-p fish)
;; => t

Constant time indexing into unibyte strings is straightforward, and Emacs does the obvious thing when indexing into unibyte strings. It helps that most strings in Emacs are probably unibyte, even when the user isn’t working in English.

Most buffers are multibyte, even if those buffers are generally just ASCII. Since Emacs uses gap buffers it generally doesn’t matter: Nearly all accesses are tightly clustered around the point, so O(n) indexing doesn’t often matter.

That leaves multibyte strings. Consider these idioms for iterating across a string in Emacs Lisp:

(dotimes (i (length string))
  (let ((c (aref string i)))
    ...))

(cl-loop for c being the elements of string
         ...)

The latter expands into essentially the same as the former: An incrementing index that uses aref to index to that code point. So is iterating over a multibyte string — a common operation — an O(n^2) operation?

The good news is that, at least in this case, no! It’s essentially just as efficient as iterating over a unibyte string. Before going over why, consider this little puzzle. Here’s a little string comparison function that compares two strings a code point at a time, returning their first difference:

(defun compare (string-a string-b)
  (cl-loop for a being the elements of string-a
           for b being the elements of string-b
           unless (eql a b)
           return (cons a b)))

Let’s examine benchmarks with some long strings (100,000 code points):

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (compare a b)))
;; => (0.012568031 0 0.0)

With using two, zeroed unibyte strings it takes 13ms. How about changing the last code point in one of them to 256, converting it to a multibyte string:

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (setf (aref a (1- (length a))) 256)
      (compare a b)))
;; => (0.012680513 0 0.0)

Same running time, so that multibyte string cost nothing more to iterate across. Let’s try making them both multibyte:

(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (setf (aref a (1- (length a))) 256
            (aref b (1- (length b))) 256)
      (compare a b)))
;; => (2.327959762 0 0.0)

That took 2.3 seconds: about 2000x longer to run! Iterating over two multibyte strings concurrently seems to have broken an optimization. Can you reason about what’s happened?

To avoid the O(n) cost on this common indexing operating, Emacs keeps a “bookmark” for the last indexing location into a multibyte string. If the next access is nearby, it can starting looking from this bookmark, forwards or backwards. Like a gap buffer, this gives a big advantage to clustered accesses, including iteration.

However, this string bookmark is global, one per Emacs instance, not once per string. In the last benchmark, the two multibyte strings are constantly fighting over a single string bookmark, and indexing in comparison function is reduced to O(n^2) time complexity.

So, Emacs pretends it has constant time access into its UTF-8 text data, but it’s only faking it with some simple optimizations. This usually works out just fine.

Julia

Another approach is to not pretend at all, and to make this limitation of UTF-8 explicit in the interface. Julia took this approach, and it was one of my complaints about the language. I don’t think this is necessarily a bad choice, but I do still think it’s inappropriate considering Julia’s target audience (i.e. Matlab users).

Julia strings are explicitly byte strings containing valid UTF-8 data. All indexing occurs on bytes, which is trivially constant time, and always decodes the multibyte code point starting at that byte. But it is an error to index to a byte that doesn’t begin a code point. That error is also trivially checked in constant time.

s = "π"

s[1]
# => 'π'

s[2]
# ERROR: UnicodeError: invalid character index
#  in getindex at ./strings/basic.jl:37

Slices are still over bytes, but they “round up” to the end of the current code point:

s[1:1]
# => "π"

Iterating over a string requires helper functions which keep an internal “bookmark” so that each access is constant time:

for i in eachindex(string)
    c = string[i]
    # ...
end

So Julia doesn’t pretend, it makes the problem explicit.

Go

Go is very similar to Julia, but takes an even more explicit view of strings. All strings are byte strings and there are no restrictions on their contents. Conventionally strings contain UTF-8 encoded text, but this is not strictly required. There’s a unicode/utf8 package for working with strings containing UTF-8 data.

Beyond convention, the range clause also assumes the string contains UTF-8 data, and it’s not an error if it does not. Bytes not containing valid UTF-8 data appear as a REPLACEMENT CHARACTER (U+FFFD).

func main() {
    s := "π\xff"
    for _, r := range s {
        fmt.Printf("U+%04x\n", r)
    }
}

// U+03c0
// U+fffd

A further case of the language favoring UTF-8 is that casting a string to []rune decodes strings into code points, like UCS-4, again using REPLACEMENT CHARACTER:

func main() {
    s := "π\xff"
    r := []rune(s)
    fmt.Printf("U+%04x\n", r[0])
    fmt.Printf("U+%04x\n", r[1])
}

// U+03c0
// U+fffd

So, like Julia, there’s no pretending, and the programmer explicitly must consider the problem.

Preferences

All-in-all I probably prefer how Julia and Go are explicit with UTF-8’s limitations, rather than Emacs Lisp’s attempt to cover it up with an internal optimization. Since the abstraction is leaky, it may as well be made explicit.

An Async / Await Library for Emacs Lisp

2019-03-10T20:57:03Z

As part of building my Python proficiency, I’ve learned how to use asyncio. This new language feature first appeared in Python 3.5 (PEP 492, September 2015). JavaScript grew a nearly identical feature in ES2017 (June 2017). An async function can pause to await on an asynchronously computed result, much like a generator pausing when it yields a value.

In fact, both Python and JavaScript async functions are essentially just fancy generator functions with some specialized syntax and semantics. That is, they’re stackless coroutines. Both languages already had generators, so their generator-like async functions are a natural extension that — unlike stackful coroutines — do not require significant, new runtime plumbing.

Emacs officially got generators in 25.1 (September 2016), though, unlike Python and JavaScript, it didn’t require any additional support from the compiler or runtime. It’s implemented entirely using Lisp macros. In other words, it’s just another library, not a core language feature. In theory, the generator library could be easily backported to the first Emacs release to properly support lexical closures, Emacs 24.1 (June 2012).

For the same reason, stackless async/await coroutines can also be implemented as a library. So that’s what I did, letting Emacs’ generator library do most of the heavy lifting. The package is called aio:

https://github.com/skeeto/emacs-aio

It’s modeled more closely on JavaScript’s async functions than Python’s asyncio, with the core representation being promises rather than a coroutine objects. I just have an easier time reasoning about promises than coroutines.

I’m definitely not the first person to realize this was possible, and was beaten to the punch by two years. Wanting to avoid fragmentation, I set aside all formality in my first iteration on the idea, not even bothering with namespacing my identifiers. It was to be only an educational exercise. However, I got quite attached to my little toy. Once I got my head wrapped around the problem, everything just sort of clicked into place so nicely.

In this article I will show step-by-step one way to build async/await on top of generators, laying out one concept at a time and then building upon each. But first, some examples to illustrate the desired final result.

aio example

Ignoring all its problems for a moment, suppose you want to use url-retrieve to fetch some content from a URL and return it. To keep this simple, I’m going to omit error handling. Also assume that lexical-binding is t for all examples. Besides, lexical scope required by the generator library, and therefore also required by aio.

The most naive approach is to fetch the content synchronously:

(defun fetch-fortune-1 (url)
  (let ((buffer (url-retrieve-synchronously url)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

The result is returned directly, and errors are communicated by an error signal (e.g. Emacs’ version of exceptions). This is convenient, but the function will block the main thread, locking up Emacs until the result has arrived. This is obviously very undesirable, so, in practice, everyone nearly always uses the asynchronous version:

(defun fetch-fortune-2 (url callback)
  (url-retrieve url (lambda (_status)
                      (funcall callback (buffer-string)))))

The main thread no longer blocks, but it’s a whole lot less convenient. The result isn’t returned to the caller, and instead the caller supplies a callback function. The result, whether success or failure, will be delivered via callback, so the caller must split itself into two pieces: the part before the callback and the callback itself. Errors cannot be delivered using a error signal because of the inverted flow control.

The situation gets worse if, say, you need to fetch results from two different URLs. You either fetch results one at a time (inefficient), or you manage two different callbacks that could be invoked in any order, and therefore have to coordinate.

Wouldn’t it be nice for the function to work like the first example, but be asynchronous like the second example? Enter async/await:

(aio-defun fetch-fortune-3 (url)
  (let ((buffer (aio-await (aio-url-retrieve url))))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

A function defined with aio-defun is just like defun except that it can use aio-await to pause and wait on any other function defined with aio-defun — or, more specifically, any function that returns a promise. Borrowing Python parlance: Returning a promise makes a function awaitable. If there’s an error, it’s delivered as a error signal from aio-url-retrieve, just like the first example. When called, this function returns immediately with a promise object that represents a future result. The caller might look like this:

(defcustom fortune-url ...)

(aio-defun display-fortune ()
  (interactive)
  (message "%s" (aio-await (fetch-fortune-3 fortune-url))))

How wonderfully clean that looks! And, yes, it even works with interactive like that. I can M-x display-fortune and a fortune is printed in the minibuffer as soon as the result arrives from the server. In the meantime Emacs doesn’t block and I can continue my work.

You can’t do anything you couldn’t already do before. It’s just a nicer way to organize the same callbacks: implicit rather than explicit.

Promises, simplified

The core object at play is the promise. Promises are already a rather simple concept, but aio promises have been distilled to their essence, as they’re only needed for this singular purpose. More on this later.

As I said, a promise represents a future result. In practical terms, a promise is just an object to which one can subscribe with a callback. When the result is ready, the callbacks are invoked. Another way to put it is that promises reify the concept of callbacks. A callback is no longer just the idea of extra argument on a function. It’s a first-class thing that itself can be passed around as a value.

Promises have two slots: the final promise result and a list of subscribers. A nil result means the result hasn’t been computed yet. It’s so simple I’m not even bothering with cl-struct.

(defun aio-promise ()
  "Create a new promise object."
  (record 'aio-promise nil ()))

(defsubst aio-promise-p (object)
  (and (eq 'aio-promise (type-of object))
       (= 3 (length object))))

(defsubst aio-result (promise)
  (aref promise 1))

To subscribe to a promise, use aio-listen:

(defun aio-listen (promise callback)
  (let ((result (aio-result promise)))
    (if result
        (run-at-time 0 nil callback result)
      (push callback (aref promise 2)))))

If the result isn’t ready yet, add the callback to the list of subscribers. If the result is ready call the callback in the next event loop turn using run-at-time. This is important because it keeps all the asynchronous components isolated from one another. They won’t see each others’ frames on the call stack, nor frames from aio. This is so important that the Promises/A+ specification is explicit about it.

The other half of the equation is resolving a promise, which is done with aio-resolve. Unlike other promises, aio promises don’t care whether the promise is being fulfilled (success) or rejected (error). Instead a promise is resolved using a value function — or, usually, a value closure. Subscribers receive this value function and extract the value by invoking it with no arguments.

Why? This lets the promise’s resolver decide the semantics of the result. Instead of returning a value, this function can instead signal an error, propagating an error signal that terminated an async function. Because of this, the promise doesn’t need to know how it’s being resolved.

When a promise is resolved, subscribers are each scheduled in their own event loop turns in the same order that they subscribed. If a promise has already been resolved, nothing happens. (Thought: Perhaps this should be an error in order to catch API misuse?)

(defun aio-resolve (promise value-function)
  (unless (aio-result promise)
    (let ((callbacks (nreverse (aref promise 2))))
      (setf (aref promise 1) value-function
            (aref promise 2) ())
      (dolist (callback callbacks)
        (run-at-time 0 nil callback value-function)))))

If you’re not an async function, you might subscribe to a promise like so:

(aio-listen promise (lambda (v)
                      (message "%s" (funcall v))))

The simplest example of a non-async function that creates and delivers on a promise is a “sleep” function:

(defun aio-sleep (seconds &optional result)
  (let ((promise (aio-promise))
        (value-function (lambda () result)))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

Similarly, here’s a “timeout” promise that delivers a special timeout error signal at a given time in the future.

(defun aio-timeout (seconds)
  (let ((promise (aio-promise))
        (value-function (lambda () (signal 'aio-timeout nil))))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

That’s all there is to promises.

Evaluate in the context of a promise

Before we get into pausing functions, lets deal with the slightly simpler matter of delivering their return values using a promise. What we need is a way to evaluate a “body” and capture its result in a promise. If the body exits due to a signal, we want to capture that as well.

Here’s a macro that does just this:

(defmacro aio-with-promise (promise &rest body)
  `(aio-resolve ,promise
                (condition-case error
                    (let ((result (progn ,@body)))
                      (lambda () result))
                  (error (lambda ()
                           (signal (car error) ; rethrow
                                   (cdr error)))))))

The body result is captured in a closure and delivered to the promise. If there’s an error signal, it’s “rethrown” into subscribers by the promise’s value function.

This is where Emacs Lisp has a serious weak spot. There’s not really a concept of rethrowing a signal. Unlike a language with explicit exception objects that can capture a snapshot of the backtrace, the original backtrace is completely lost where the signal is caught. There’s no way to “reattach” it to the signal when it’s rethrown. This is unfortunate because it would greatly help debugging if you got to see the full backtrace on the other side of the promise.

Async functions

So we have promises and we want to pause a function on a promise. Generators have iter-yield for pausing an iterator’s execution. To tackle this problem:

Yield the promise to pause the iterator.
Subscribe a callback on the promise that continues the generator (iter-next) with the promise’s result as the yield result.

All the hard work is done in either side of the yield, so aio-await is just a simple wrapper around iter-yield:

(defmacro aio-await (expr)
  `(funcall (iter-yield ,expr)))

Remember, that funcall is here to extract the promise value from the value function. If it signals an error, this propagates directly into the iterator just as if it had been a direct call — minus an accurate backtrace.

So aio-lambda / aio-defun needs to wrap the body in a generator (iter-lamba), invoke it to produce a generator, then drive the generator using callbacks. Here’s a simplified, unhygienic definition of aio-lambda:

(defmacro aio-lambda (arglist &rest body)
  `(lambda (&rest args)
     (let ((promise (aio-promise))
           (iter (apply (iter-lambda ,arglist
                          (aio-with-promise promise
                            ,@body))
                        args)))
       (prog1 promise
         (aio--step iter promise nil)))))

The body is evaluated inside aio-with-promise with the result delivered to the promise returned directly by the async function.

Before returning, the iterator is handed to aio--step, which drives the iterator forward until it delivers its first promise. When the iterator yields a promise, aio--step attaches a callback back to itself on the promise as described above. Immediately driving the iterator up to the first yielded promise “primes” it, which is important for getting the ball rolling on any asynchronous operations.

If the iterator ever yields something other than a promise, it’s delivered right back into the iterator.

(defun aio--step (iter promise yield-result)
  (condition-case _
      (cl-loop for result = (iter-next iter yield-result)
               then (iter-next iter (lambda () result))
               until (aio-promise-p result)
               finally (aio-listen result
                                   (lambda (value)
                                     (aio--step iter promise value))))
    (iter-end-of-sequence)))

When the iterator is done, nothing more needs to happen since the iterator resolves its own return value promise.

The definition of aio-defun just uses aio-lambda with defalias. There’s nothing to it.

That’s everything you need! Everything else in the package is merely useful, awaitable functions like aio-sleep and aio-timeout.

Composing promises

Unfortunately url-retrieve doesn’t support timeouts. We can work around this by composing two promises: a url-retrieve promise and aio-timeout promise. First define a promise-returning function, aio-select that takes a list of promises and returns (as another promise) the first promise to resolve:

(defun aio-select (promises)
  (let ((result (aio-promise)))
    (prog1 result
      (dolist (promise promises)
        (aio-listen promise (lambda (_)
                              (aio-resolve
                               result
                               (lambda () promise))))))))

We give aio-select both our url-retrieve and timeout promises, and it tells us which resolved first:

(aio-defun fetch-fortune-4 (url timeout)
  (let* ((promises (list (aio-url-retrieve url)
                         (aio-timeout timeout)))
         (fastest (aio-await (aio-select promises)))
         (buffer (aio-await fastest)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

Cool! Note: This will not actually cancel the URL request, just move the async function forward earlier and prevent it from getting the result.

Threads

Despite aio being entirely about managing concurrent, asynchronous operations, it has nothing at all to do with threads — as in Emacs 26’s support for kernel threads. All async functions and promise callbacks are expected to run only on the main thread. That’s not to say an async function can’t await on a result from another thread. It just must be done very carefully.

Processes

The package also includes two functions for realizing promises on processes, whether they be subprocesses or network sockets.

aio-process-filter
aio-process-sentinel

For example, this function loops over each chunk of output (typically 4kB) from the process, as delivered to a filter function:

(aio-defun process-chunks (process)
  (cl-loop for chunk = (aio-await (aio-process-filter process))
           while chunk
           do (... process chunk ...)))

Exercise for the reader: Write an awaitable function that returns a line at at time rather than a chunk at a time. You can build it on top of aio-process-filter.

I considered wrapping functions like start-process so that their aio versions would return a promise representing some kind of result from the process. However there are so many different ways to create and configure processes that I would have ended up duplicating all the process functions. Focusing on the filter and sentinel, and letting the caller create and configure the process is much cleaner.

Unfortunately Emacs has no asynchronous API for writing output to a process. Both process-send-string and process-send-region will block if the pipe or socket is full. There is no callback, so you cannot await on writing output. Maybe there’s a way to do it with a dedicated thread?

Another issue is that the process-send-* functions are preemptible, made necessary because they block. The aio-process-* functions leave a gap (i.e. between filter awaits) where no filter or sentinel function is attached. It’s a consequence of promises being single-fire. The gap is harmless so long as the async function doesn’t await something else or get preempted. This needs some more thought.

Update: These process functions no longer exist and have been replaced by a small framework for building chains of promises. See aio-make-callback.

Testing aio

The test suite for aio is a bit unusual. Emacs’ built-in test suite, ERT, doesn’t support asynchronous tests. Furthermore, tests are generally run in batch mode, where Emacs invokes a single function and then exits rather than pump an event loop. Batch mode can only handle asynchronous process I/O, not the async functions of aio. So it’s not possible to run the tests in batch mode.

Instead I hacked together a really crude callback-based test suite. It runs in non-batch mode and writes the test results into a buffer (run with make check). Not ideal, but it works.

One of the tests is a sleep sort (with reasonable tolerances). It’s a pretty neat demonstration of what you can do with aio:

(aio-defun sleep-sort (values)
  (let ((promises (mapcar (lambda (v) (aio-sleep v v)) values)))
    (cl-loop while promises
             for next = (aio-await (aio-select promises))
             do (setf promises (delq next promises))
             collect (aio-await next))))

To see it in action (M-x sleep-sort-demo):

(aio-defun sleep-sort-demo ()
  (interactive)
  (let ((values '(0.1 0.4 1.1 0.2 0.8 0.6)))
    (message "%S" (aio-await (sleep-sort values)))))

Async/await is pretty awesome

I’m quite happy with how this all came together. Once I had the concepts straight — particularly resolving to value functions — everything made sense and all the parts fit together well, and mostly by accident. That feels good.

Python Decorators: Syntactic Artificial Sweetener

2019-03-08T23:00:49Z

Python has a feature called function decorators. With a little bit of syntax, the behavior of a function or class can be modified in useful ways. Python comes with a few decorators, but most of the useful ones are found in third-party libraries.

PEP 318 suggests a very simple, but practical decorator called synchronized, though it doesn’t provide a concrete example. Consider this function that increments a global counter:

counter = 0

def increment():
    global counter
    counter = counter + 1

If this function is called from multiple threads, there’s a race condition — though, at least for CPython, it’s not a data race thanks to the Global Interpreter Lock (GIL). Incrementing the counter is not an atomic operation, as illustrated by its byte code:

LOAD_GLOBAL              0 (counter)
LOAD_CONST               1 (1)
BINARY_ADD
STORE_GLOBAL             0 (counter)
LOAD_CONST               0 (None)
RETURN_VALUE

The variable is loaded, operated upon, and stored. Another thread could be scheduled between any of these instructions and cause an undesired result. It’s easy to see that in practice:

from threading import Thread

def worker():
    for i in range(200000):
        increment()

threads = [Thread(target=worker) for _ in range(8)];
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

print(counter)

The increment function is called exactly 1.6 million times, but on my system I get different results on each run:

$ python3 example.py 
1306205
$ python3 example.py 
1162418
$ python3 example.py 
1076801

I could change the definition of increment() to use synchronization, but wouldn’t it be nice if I could just tell Python to synchronize this function? This is where a function decorator shines:

from threading import Lock

def synchronized(f):
    lock = Lock()
    def wrapper():
        with lock:
            return f()
    return wrapper

The synchronized function is a higher order function that accepts a function and returns a function — or, more specifically, a callable. The purpose is to wrap and decorate the function it’s given. In this case the function is wrapped in a mutual exclusion lock. Note: This implementation is very simple and only works for functions that accept no arguments.

To use it, I just add a single line to increment:

@synchronized
def increment():
    global counter
    counter = counter + 1

With this change my program now always prints 1600000.

Syntactic “sugar”

Everyone is quick to point out that this is just syntactic sugar, and that you can accomplish this without the @ syntax. For example, the last definition of increment is equivalent to:

def increment():
    ...

increment = synchronized(increment)

Decorators can also be parameterized. For example, Python’s functools module has an lru_cache decorator for memoizing a function:

@lru_cache(maxsize=32)
def expensive(n):
    ...

Which is equivalent to this very direct source transformation:

def expensive(n):
    ...

expensive = lru_cache(maxsize=32)(expensive)

So what comes after the @ isn’t just a name. In fact, it looks like it can be any kind of expression that evaluates to a function decorator. Or is it?

Syntactic artificial sweetener

Reality is often disappointing. Let’s try using an “identity” decorator defined using lambda. This decorator will accomplish nothing, but it will test if we can decorate a function using a lambda expression.

@lambda f: f
def foo():
    pass

But Python complains:

    @lambda f: f
          ^
SyntaxError: invalid syntax

Maybe Python is absolutely literal about the syntax sugar thing, and it’s more like a kind of macro replacement. Let’s try wrapping it in parentheses:

@(lambda f: f)
def foo(n):
    pass

Nope, same error, but now pointing at the opening parenthesis. Getting desperate now:

@[synchronized][0]
def foo():
    pass

Again, syntax error. What’s going on?

Pattern matching

The problem is that the Python language reference doesn’t parse an expression after @. It matches a very specific pattern that just so happens to look like a Python expression. It’s not syntactic sugar, it’s syntactic artificial sweetener!

ator ::= "@" dotted_name ["(" [argument_list [","]] ")"] NEWLINE

In a way, this puts Python in the ranks of PHP 5 and Matlab: two languages with completely screwed up grammars that can only parse specific constructions that the developers had anticipated. For example, in PHP 5 (fixed in PHP 7):

function foo() {
    return function() {
        return 0;
    };
}

foo()();

That is a syntax error:

PHP Parse error:  syntax error, unexpected '(', expecting ',' or ';'

Or in any version of Matlab:

    magic(4)(:)

That is a syntax error:

Unbalanced or unexpected parenthesis or bracket

In Python’s defense, this strange, limited syntax is only in a single place rather than everywhere, but I still wonder why it was defined that way.

Update: Clément Pit-Claudel pointed out the explanation in the PEP, which references a 2004 email by Guido van Rossum:

I have a gut feeling about this one. I’m not sure where it comes from, but I have it. It may be that I want the compiler to be able to recognize certain decorators.

So while it would be quite easy to change the syntax to @test in the future, I’d like to stick with the more restricted form unless a real use case is presented where allowing @test would increase readability. (@foo().bar() doesn’t count because I don’t expect you’ll ever need that).

The CPython Bytecode Compiler is Dumb

2019-02-24T21:56:35Z

This article was discussed on Hacker News.

Due to sheer coincidence of several unrelated tasks converging on Python at work, I recently needed to brush up on my Python skills. So far for me, Python has been little more than a fancy extension language for BeautifulSoup, though I also used it to participate in the recent tradition of writing one’s own static site generator, in this case for my wife’s photo blog. I’ve been reading through Fluent Python by Luciano Ramalho, and it’s been quite effective at getting me up to speed.

As I write Python, like with Emacs Lisp, I can’t help but consider what exactly is happening inside the interpreter. I wonder if the code I’m writing is putting undue constraints on the bytecode compiler and limiting its options. Ultimately I’d like the code I write to drive the interpreter efficiently and effectively. The Zen of Python says there should “only one obvious way to do it,” but in practice there’s a lot of room for expression. Given multiple ways to express the same algorithm or idea, I tend to prefer the one that compiles to the more efficient bytecode.

Fortunately CPython, the main and most widely used implementation of Python, is very transparent about its bytecode. It’s easy to inspect and reason about its bytecode. The disassembly listing is easy to read and understand, and I can always follow it without consulting the documentation. This contrasts sharply with modern JavaScript engines and their opaque use of JIT compilation, where performance is guided by obeying certain patterns (hidden classes, etc.), helping the compiler understand my program’s types, and being careful not to unnecessarily constrain the compiler.

So, besides just catching up with Python the language, I’ve been studying the bytecode disassembly of the functions that I write. One fact has become quite apparent: the CPython bytecode compiler is pretty dumb. With a few exceptions, it’s a very literal translation of a Python program, and there is almost no optimization. Below I’ll demonstrate a case where it’s possible to detect one of the missed optimizations without inspecting the bytecode disassembly thanks to a small abstraction leak in the optimizer.

To be clear: This isn’t to say CPython is bad, or even that it should necessarily change. In fact, as I’ll show, dumb bytecode compilers are par for the course. In the past I’ve lamented how the Emacs Lisp compiler could do a better job, but CPython and Lua are operating at the same level. There are benefits to a dumb and straightforward bytecode compiler: the compiler itself is simpler, easier to maintain, and more amenable to modification (e.g. as Python continues to evolve). It’s also easier to debug Python (pdb) because it’s such a close match to the source listing.

Update: Darius Bacon points out that Guido van Rossum himself said, “Python is about having the simplest, dumbest compiler imaginable.” So this is all very much by design.

The consensus seems to be that if you want or need better performance, use something other than Python. (And if you can’t do that, at least use PyPy.) That’s a fairly reasonable and healthy goal. Still, if I’m writing Python, I’d like to do the best I can, which means exploiting the optimizations that are available when possible.

Disassembly examples

I’m going to compare three bytecode compilers in this article: CPython 3.7, Lua 5.3, and Emacs 26.1. Each of these languages are dynamically typed, are primarily executed on a bytecode virtual machine, and it’s easy to access their disassembly listings. One caveat: CPython and Emacs use a stack-based virtual machine while Lua uses a register-based virtual machine.

For CPython I’ll be using the dis module. For Emacs Lisp I’ll use M-x disassemble, and all code will use lexical scoping. In Lua I’ll use lua -l on the command line.

Local variable elimination

Will the bytecode compiler eliminate local variables? Keeping the variable around potentially involves allocating memory for it, assigning to it, and accessing it. Take this example:

def foo():
    x = 0
    y = 1
    return x

This function is equivalent to:

def foo():
    return 0

Despite this, CPython completely misses this optimization for both x and y:

  2           0 LOAD_CONST               1 (0)
              2 STORE_FAST               0 (x)
  3           4 LOAD_CONST               2 (1)
              6 STORE_FAST               1 (y)
  4           8 LOAD_FAST                0 (x)
             10 RETURN_VALUE

It assigns both variables, and even loads again from x for the return. Missed optimizations, but, as I said, by keeping these variables around, debugging is more straightforward. Users can always inspect variables.

How about Lua?

function foo()
    local x = 0
    local y = 1
    return x
end

It also misses this optimization, though it matters a little less due to its architecture (the return instruction references a register regardless of whether or not that register is allocated to a local variable):

        1       [2]     LOADK           0 -1    ; 0
        2       [3]     LOADK           1 -2    ; 1
        3       [4]     RETURN          0 2
        4       [5]     RETURN          0 1

Emacs Lisp also misses it:

(defun foo ()
  (let ((x 0)
        (y 1))
    x))

Disassembly:

constant  0
constant  1
stack-ref 1
return

All three are on the same page.

Constant folding

Does the bytecode compiler evaluate simple constant expressions at compile time? This is simple and everyone does it.

def foo():
    return 1 + 2 * 3 / 4

Disassembly:

  2           0 LOAD_CONST               1 (2.5)
              2 RETURN_VALUE

Lua:

function foo()
    return 1 + 2 * 3 / 4
end

Disassembly:

        1       [2]     LOADK           0 -1    ; 2.5
        2       [2]     RETURN          0 2
        3       [3]     RETURN          0 1

Emacs Lisp:

(defun foo ()
  (+ 1 (/ (* 2 3) 4.0))

Disassembly:

0	constant  2.5
1	return

That’s something we can count on so long as the operands are all numeric literals (or also, for Python, string literals) that are visible to the compiler. Don’t count on your operator overloads to work here, though.

Allocation optimization

Optimizers often perform escape analysis, to determine if objects allocated in a function ever become visible outside of that function. If they don’t then these objects could potentially be stack-allocated (instead of heap-allocated) or even be eliminated entirely.

None of the bytecode compilers are this sophisticated. However CPython does have a trick up its sleeve: tuple optimization. Since tuples are immutable, in certain circumstances CPython will reuse them and avoid both the constructor and the allocation.

def foo():
    return (1, 2, 3)

Check it out, the tuple is used as a constant:

  2           0 LOAD_CONST               1 ((1, 2, 3))
              2 RETURN_VALUE

Which we can detect by evaluating foo() is foo(), which is True. Though deviate from this too much and the optimization is disabled. Remember how CPython can’t optimize away variables, and that they break constant folding? The break this, too:

def foo():
    x = 1
    return (x, 2, 3)

Disassembly:

  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (x)
  3           4 LOAD_FAST                0 (x)
              6 LOAD_CONST               2 (2)
              8 LOAD_CONST               3 (3)
             10 BUILD_TUPLE              3
             12 RETURN_VALUE

This function might document that it always returns a simple tuple, but we can tell if its being optimized or not using is like before: foo() is foo() is now False! In some future version of Python with a cleverer bytecode compiler, that expression might evaluate to True. (Unless the Python language specification is specific about this case, which I didn’t check.)

Note: Curiously PyPy replicates this exact behavior when examined with is. Was that deliberate? I’m impressed that PyPy matches CPython’s semantics so closely here.

Putting a mutable value, such as a list, in the tuple will also break this optimization. But that’s not the compiler being dumb. That’s a hard constraint on the compiler: the caller might change the mutable component of the tuple, so it must always return a fresh copy.

Neither Lua nor Emacs Lisp have a language-level concept equivalent of an immutable tuple, so there’s nothing to compare.

Other than the tuples situation in CPython, none of the bytecode compilers eliminate unnecessary intermediate objects.

def foo():
    return [1024][0]

Disassembly:

  2           0 LOAD_CONST               1 (1024)
              2 BUILD_LIST               1
              4 LOAD_CONST               2 (0)
              6 BINARY_SUBSCR
              8 RETURN_VALUE

Lua:

function foo()
    return ({1024})[1]
end

Disassembly:

        1       [2]     NEWTABLE        0 1 0
        2       [2]     LOADK           1 -1    ; 1024
        3       [2]     SETLIST         0 1 1   ; 1
        4       [2]     GETTABLE        0 0 -2  ; 1
        5       [2]     RETURN          0 2
        6       [3]     RETURN          0 1

Emacs Lisp:

(defun foo ()
  (car (list 1024)))

Disassembly:

constant  1024
list1
car
return

Don’t expect too much

I could go on with lots of examples, looking at loop optimizations and so on, and each case is almost certainly unoptimized. The general rule of thumb is to simply not expect much from these bytecode compilers. They’re very literal in their translation.

Working so much in C has put me in the habit of expecting all obvious optimizations from the compiler. This frees me to be more expressive in my code. Lots of things are cost-free thanks to these optimizations, such as breaking a complex expression up into several variables, naming my constants, or not using a local variable to manually cache memory accesses. I’m confident the compiler will optimize away my expressiveness. The catch is that clever compilers can take things too far, so I’ve got to be mindful of how it might undermine my intentions — i.e. when I’m doing something unusual or not strictly permitted.

These bytecode compilers will never truly surprise me. The cost is that being more expressive in Python, Lua, or Emacs Lisp may reduce performance at run time because it shows in the bytecode. Usually this doesn’t matter, but sometimes it does.

A JavaScript Typed Array Gotcha

2019-01-23T02:50:30Z

JavaScript’s prefix increment and decrement operators can be surprising when applied to typed arrays. It caught be by surprise when I was porting some C code over to JavaScript Just using your brain to execute this code, what do you believe is the value of r?

let array = new Uint8Array([255]);
let r = ++array[0];

The increment and decrement operators originated in the B programming language. Its closest living relative today is C, and, as far as these operators are concered, C can be considered an ancestor of JavaScript. So what is the value of r in this similar C code?

uint8_t array[] = {255};
int r = ++array[0];

Of course, if they were the same then there would be nothing to write about, so that should make it easier to guess if you aren’t sure. The answer: In JavaScript, r is 256. In C, r is 0.

What happened to me was that I wrote an 80-bit integer increment routine in C like this:

uint8_t array[10];
/* ... */
for (int i = 9; i >= 0; i--)
    if (++array[i])
        break;

But I was getting the wrong result over in JavaScript from essentially the same code:

let array = new Uint8Array(10);
/* ... */
for (let i = 9; i >= 0; i--)
    if (++array[i])
        break;

So what’s going on here?

JavaScript specification

The ES5 specification says this about the prefix increment operator:

Let expr be the result of evaluating UnaryExpression.

Throw a SyntaxError exception if the following conditions are all true: [omitted]

Let oldValue be ToNumber(GetValue(expr)).

Let newValue be the result of adding the value 1 to oldValue, using the same rules as for the + operator (see 11.6.3).

Call PutValue(expr, newValue).

Return newValue.

So, oldValue is 255. This is a double precision float because all numbers in JavaScript (outside of the bitwise operations) are double precision floating point. Add 1 to this value to get 256, which is newValue. When newValue is stored in the array via PutValue(), it’s converted to an unsigned 8-bit integer, which truncates it to 0.

However, newValue is returned, not the value that was actually stored in the array!

Since JavaScript is dynamically typed, this difference did not actually matter until typed arrays are involved. I suspect if typed arrays were in JavaScript from the beginning, the specified behavior would be more in line with C.

This behavior isn’t limited to the prefix operators. Consider assignment:

let array = new Uint8Array([255]);
let r = (array[0] = array[0] + 1);
let s = (array[0] += 1);

Both r and s will still be 256. The result of the assignment operators is a similar story:

LeftHandSideExpression = AssignmentExpression is evaluated as follows:

Let lref be the result of evaluating LeftHandSideExpression.

Let rref be the result of evaluating AssignmentExpression.

Let rval be GetValue(rref).

Throw a SyntaxError exception if the following conditions are all true: [omitted]

Call PutValue(lref, rval).

Return rval.

Again, the result of the expression is independent of how it was stored with PutValue().

C specification

I’ll be referencing the original C89/C90 standard. The C specification requires a little more work to get to the bottom of the issue. Starting with 3.3.3.1 (Prefix increment and decrement operators):

The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).

Later in 3.3.16.2 (Compound assignment):

A compound assignment of the form E1 op = E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.

Then finally in 3.3.16 (Assignment operators):

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

So the result is explicitly the value after assignment. Let’s look at this step by step after rewriting the expression.

int r = (array[0] = array[0] + 1);

In C, all integer operations are performed with at least int precision. Smaller integers are implicitly promoted to int before the operation. The value of array[0] is 255, and, since uint8_t is smaller than int, it gets promoted to int. Additionally, the literal constant 1 is also an int, so there are actually two reasons for this promotion.

So since these are int values, the result of the addition is 256, like in JavaScript. To store the result, this value is then demoted to uint8_t and truncated to 0. Finally, this post-assignment 0 is the result of the expression, not the right-hand result as in JavaScript.

Specifications are useful

These situations are why I prefer programming languages that have a formal and approachable specification. If there’s no specification and I’m observing undocumented, idiosyncratic behavior, is this just some subtle quirk of the current implementation — e.g. something that might change without notice in the future — or is it intended behavior that I can rely upon for correctness?

Emacs 26 Brings Generators and Threads

2018-05-31T17:45:16Z

Emacs 26.1 was recently released. As you would expect from a major release, it comes with lots of new goodies. Being a bit of an Emacs Lisp enthusiast, the two most interesting new features are generators (iter) and native threads (thread).

Correction: Generators were actually introduced in Emacs 25.1 (Sept. 2016), not Emacs 26.1. Doh!

Update: ThreadSanitizer (TSan) quickly shows that Emacs’ threading implementation has many data races, making it completely untrustworthy. Until this is fixed, nobody should use Emacs threads for any purpose, and threads should disabled at compile time.

Generators

Generators are one of those cool language features that provide a lot of power at a small implementation cost. They’re like a constrained form of coroutines, but, unlike coroutines, they’re typically built entirely on top of first-class functions (e.g. closures). This means no additional run-time support is needed in order to add generators to a language. The only complications are the changes to the compiler. Generators are not compiled the same way as normal functions despite looking so similar.

What’s perhaps coolest of all about lisp-family generators, including Emacs Lisp, is that the compiler component can be implemented entirely with macros. The compiler need not be modified at all, making generators no more than a library, and not actually part of the language. That’s exactly how they’ve been implemented in Emacs Lisp (emacs-lisp/generator.el).

So what’s a generator? It’s a function that returns an iterator object. When an iterator object is invoked (e.g. iter-next) it evaluates the body of the generator. Each iterator is independent. What makes them unusual (and useful) is that the evaluation is paused in the middle of the body to return a value, saving all the internal state in the iterator. Normally pausing in the middle of functions isn’t possible, which is what requires the special compiler support.

Emacs Lisp generators appear to be most closely modeled after Python generators, though it also shares some similarities to JavaScript generators. What makes it most like Python is the use of signals for flow control — something I’m not personally enthused about. When a Python generator completes, it throws a StopItertion exception. In Emacs Lisp, it’s an iter-end-of-sequence signal. A signal is out-of-band and avoids the issue relying on some special in-band value to communicate the end of iteration.

In contrast, JavaScript’s solution is to return a “rich” object wrapping the actual yield value. This object has a done field that communicates whether iteration has completed. This avoids the use of exceptions for flow control, but the caller has to unpack the rich object.

Fortunately the flow control issue isn’t normally exposed to Emacs Lisp code. Most of the time you’ll use the iter-do macro or (my preference) the new cl-loop keyword iter-by.

To illustrate how a generator works, here’s a really simple iterator that iterates over a list:

(iter-defun walk (list)
  (while list
    (iter-yield (pop list))))

Here’s how it might be used:

(setf i (walk '(:a :b :c)))

(iter-next i)  ; => :a
(iter-next i)  ; => :b
(iter-next i)  ; => :c
(iter-next i)  ; error: iter-end-of-sequence

The iterator object itself is opaque and you shouldn’t rely on any part of its structure. That being said, I’m a firm believer that we should understand how things work underneath the hood so that we can make the most effective use of at them. No program should rely on the particulars of the iterator object internals for correctness, but a well-written program should employ them in a way that best exploits their expected implementation.

Currently iterator objects are closures, and iter-next invokes the closure with its own internal protocol. It asks the closure to return the next value (:next operation), and iter-close asks it to clean itself up (:close operation).

Since they’re just closures, another really cool thing about Emacs Lisp generators is that iterator objects are generally readable. That is, you can serialize them out with print and bring them back to life with read, even in another instance of Emacs. They exist independently of the original generator function. This will not work if one of the values captured in the iterator object is not readable (e.g. buffers).

How does pausing work? Well, one of other exciting new features of Emacs 26 is the introduction of a jump table opcode, switch. I’d lamented in the past that large cond and cl-case expressions could be a lot more efficient if Emacs’ byte code supported jump tables. It turns an O(n) sequence of comparisons into an O(1) lookup and jump. It’s essentially the perfect foundation for a generator since it can be used to jump straight back to the position where evaluation was paused.

Buuut, generators do not currently use jump tables. The generator library predates the new switch opcode, and, being independent of it, its author, Daniel Colascione, went with the best option at the time. Chunks of code between yields are packaged as individual closures. These closures are linked together a bit like nodes in a graph, creating a sort of state machine. To get the next value, the iterator object invokes the closure representing the next state.

I’ve manually macro expanded the walk generator above into a form that roughly resembles the expansion of iter-defun:

(defun walk (list)
  (let (state)
    (cl-flet* ((state-2 ()
                 (signal 'iter-end-of-sequence nil))
               (state-1 ()
                 (prog1 (pop list)
                   (when (null list)
                     (setf state #'state-2))))
               (state-0 ()
                 (if (null list)
                     (state-2)
                   (setf state #'state-1)
                   (state-1))))
      (setf state #'state-0)
      (lambda ()
        (funcall state)))))

This omits the protocol I mentioned, and it doesn’t have yield results (values passed to the iterator). The actual expansion is a whole lot messier and less optimal than this, but hopefully my hand-rolled generator is illustrative enough. Without the protocol, this iterator is stepped using funcall rather than iter-next.

The state variable keeps track of where in the body of the generator this iterator is currently “paused.” Continuing the iterator is therefore just a matter of invoking the closure that represents this state. Each state closure may update state to point to a new part of the generator body. The terminal state is obviously state-2. Notice how state transitions occur around branches.

I had said generators can be implemented as a library in Emacs Lisp. Unfortunately theres a hole in this: unwind-protect. It’s not valid to yield inside an unwind-protect form. Unlike, say, a throw-catch, there’s no mechanism to trap an unwinding stack so that it can be restarted later. The state closure needs to return and fall through the unwind-protect.

A jump table version of the generator might look like the following. I’ve used cl-labels since it allows for recursion.

(defun walk (list)
  (let ((state 0))
    (cl-labels
        ((closure ()
           (cl-case state
             (0 (if (null list)
                    (setf state 2)
                  (setf state 1))
                (closure))
             (1 (prog1 (pop list)
                  (when (null list)
                    (setf state 2))))
             (2 (signal 'iter-end-of-sequence nil)))))
      #'closure)))

When byte compiled on Emacs 26, that cl-case is turned into a jump table. This “switch” form is closer to how generators are implemented in other languages.

Iterator objects can share state between themselves if they close over a common environment (or, of course, use the same global variables).

(setf foo
      (let ((list '(:a :b :c)))
        (list
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list)))))
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list))))))))

(iter-next (nth 0 foo))  ; => :a
(iter-next (nth 1 foo))  ; => :b
(iter-next (nth 0 foo))  ; => :c

For years there has been a very crude way to “pause” a function and allow other functions to run: accept-process-output. It only works in the context of processes, but five years ago this was sufficient for me to build primitives on top of it. Unlike this old process function, generators do not block threads, including the user interface, which is really important.

Threads

Emacs 26 also bring us threads, which have been attached in a very bolted on fashion. It’s not much more than a subset of pthreads: shared memory threads, recursive mutexes, and condition variables. The interfaces look just like they do in pthreads, and there hasn’t been much done to integrate more naturally into the Emacs Lisp ecosystem.

This is also only the first step in bringing threading to Emacs Lisp. Right now there’s effectively a global interpreter lock (GIL), and threads only run one at a time cooperatively. Like with generators, the Python influence is obvious. In theory, sometime in the future this interpreter lock will be removed, making way for actual concurrency.

This is, again, where I think it’s useful to contrast with JavaScript, which was also initially designed to be single-threaded. Low-level threading primitives weren’t exposed — though mostly because JavaScript typically runs sandboxed and there’s no safe way to expose those primitives. Instead it got a web worker API that exposes concurrency at a much higher level, along with an efficient interface for thread coordination.

For Emacs Lisp, I’d prefer something safer, more like the JavaScript approach. Low-level pthreads are now a great way to wreck Emacs with deadlocks (with no C-g escape). Playing around with the new threading API for just a few days, I’ve already had to restart Emacs a bunch of times. Bugs in Emacs Lisp are normally a lot more forgiving.

One important detail that has been designed well is that dynamic bindings are thread-local. This is really essential for correct behavior. This is also an easy way to create thread-local storage (TLS): dynamically bind variables in the thread’s entrance function.

;;; -*- lexical-binding: t; -*-

(defvar foo-counter-tls)
(defvar foo-path-tls)

(defun foo-make-thread (path)
  (make-thread
   (lambda ()
     (let ((foo-counter-tls 0)
           (foo-name-tls path))
       ...))))

However, cl-letf “bindings” are not thread-local, which makes this otherwise incredibly useful macro quite dangerous in the presence of threads. This is one way that the new threading API feels bolted on.

Building generators on threads

In my stack clashing article I showed a few different ways to add coroutine support to C. One method spawned per-coroutine threads, and coordinated using semaphores. With the new threads API in Emacs, it’s possible to do exactly the same thing.

Since generators are just a limited form of coroutines, this means threads offer another, very different way to implement them. The threads API doesn’t provide semaphores, but condition variables can fill in for them. To “pause” in the middle of the generator, just wait on a condition variable.

So, naturally, I just had to see if I could make it work. I call it a “thread iterator” or “thriter.” The API is very similar to iter:

https://github.com/skeeto/thriter

This is merely a proof of concept so don’t actually use this library for anything. These thread-based generators are about 5x slower than iter generators, and they’re a lot more heavy-weight, needing an entire thread per iterator object. This makes thriter-close all the more important. On the other hand, these generators have no problem yielding inside unwind-protect.

Originally this article was going to dive into the details of how these thread-iterators worked, but thriter turned out to be quite a bit more complicated than I anticipated, especially as I worked towards feature matching iter.

The gist of it is that each side of a next/yield transaction gets its own condition variable, but share a common mutex. Values are passed between the threads using slots on the iterator object. The side that isn’t currently running waits on a condition variable until the other side frees it, after which the releaser waits on its own condition variable for the result. This is similar to asynchronous requests in Emacs dynamic modules.

Rather than use signals to indicate completion, I modeled it after JavaScript generators. Iterators return a cons cell. The car indicates continuation and the cdr holds the yield result. To terminate an iterator early (thriter-close or garbage collection), thread-signal is used to essentially “cancel” the thread and knock it off the condition variable.

Since threads aren’t (and shouldn’t be) garbage collected, failing to run a thread-iterator to completion would normally cause a memory leak, as the thread sits there forever waiting on a “next” that will never come. To deal with this, there’s a finalizer is attached to the iterator object in such a way that it’s not visible to the thread. A lost iterator is eventually cleaned up by the garbage collector, but, as usual with finalizers, this is only a last resort.

The future of threads

This thread-iterator project was my initial, little experiment with Emacs Lisp threads, similar to why I connected a joystick to Emacs using a dynamic module. While I don’t expect the current thread API to go away, it’s not really suitable for general use in its raw form. Bugs in Emacs Lisp programs should virtually never bring down Emacs and require a restart. Outside of threads, the few situations that break this rule are very easy to avoid (and very obvious that something dangerous is happening). Dynamic modules are dangerous by necessity, but concurrency doesn’t have to be.

There really needs to be a safe, high-level API with clean thread isolation. Perhaps this higher-level API will eventually build on top of the low-level threading API.

Emacs Lisp Lambda Expressions Are Not Self-Evaluating

2018-02-22T21:30:57Z

This week I made a mistake that ultimately enlightened me about the nature of function objects in Emacs Lisp. There are three kinds of function objects, but they each behave very differently when evaluated as objects.

But before we get to that, let’s talk about one of Emacs’ embarrassing, old missteps: eval-after-load.

Taming an old dragon

One of the long-standing issues with Emacs is that loading Emacs Lisp files (.el and .elc) is a slow process, even when those files have been byte compiled. There are a number of dirty hacks in place to deal with this issue, and the biggest and nastiest of them all is the dumper, also known as unexec.

The Emacs you routinely use throughout the day is actually a previous instance of Emacs that’s been resurrected from the dead. Your undead Emacs was probably created months, if not years, earlier, back when it was originally compiled. The first stage of compiling Emacs is to compile a minimal C core called temacs. The second stage is loading a bunch of Emacs Lisp files, then dumping a memory image in an unportable, platform-dependent way. On Linux, this actually requires special hooks in glibc. The Emacs you know and love is this dumped image loaded back into memory, continuing from where it left off just after it was compiled. Regardless of your own feelings on the matter, you have to admit this is a very lispy thing to do.

There are two notable costs to Emacs’ dumper:

The dumped image contains hard-coded memory addresses. This means Emacs can’t be a Position Independent Executable (PIE). It can’t take advantage of a security feature called Address Space Layout Randomization (ASLR), which would increase the difficulty of exploiting some classes of bugs. This might be important to you if Emacs processes untrusted data, such as when it’s used as a mail client, a web server or generally parses data downloaded across the network.
It’s not possible to cross-compile Emacs since it can only be dumped by running temacs on its target platform. As an experiment I’ve attempted to dump the Windows version of Emacs on Linux using Wine, but was unsuccessful.

The good news is that there’s a portable dumper in the works that makes this a lot less nasty. If you’re adventurous, you can already disable dumping and run temacs directly by setting CANNOT_DUMP=yes at compile time. Be warned, though, that a non-dumped Emacs takes several seconds, or worse, to initialize before it even begins loading your own configuration. It’s also somewhat buggy since it seems nobody ever runs it this way productively.

The other major way Emacs users have worked around slow loading is aggressive use of lazy loading, generally via autoloads. The major package interactive entry points are defined ahead of time as stub functions. These stubs, when invoked, load the full package, which overrides the stub definition, then finally the stub re-invokes the new definition with the same arguments.

To further assist with lazy loading, an evaluated defvar form will not override an existing global variable binding. This means you can, to a certain extent, configure a package before it’s loaded. The package will not clobber any existing configuration when it loads. This also explains the bizarre interfaces for the various hook functions, like add-hook and run-hooks. These accept symbols — the names of the variables — rather than values of those variables as would normally be the case. The add-to-list function does the same thing. It’s all intended to cooperate with lazy loading, where the variable may not have been defined yet.

eval-after-load

Sometimes this isn’t enough and you need some some configuration to take place after the package has been loaded, but without forcing it to load early. That is, you need to tell Emacs “evaluate this code after this particular package loads.” That’s where eval-after-load comes into play, except for its fatal flaw: it takes the word “eval” completely literally.

The first argument to eval-after-load is the name of a package. Fair enough. The second argument is a form that will be passed to eval after that package is loaded. Now hold on a minute. The general rule of thumb is that if you’re calling eval, you’re probably doing something seriously wrong, and this function is no exception. This is completely the wrong mechanism for the task.

The second argument should have been a function — either a (sharp quoted) symbol or a function object. And then instead of eval it would be something more sensible, like funcall. Perhaps this improved version would be named call-after-load or run-after-load.

The big problem with passing an s-expression is that it will be left uncompiled due to being quoted. I’ve talked before about the importance of evaluating your lambdas. eval-after-load not only encourages badly written Emacs Lisp, it demands it.

;;; BAD!
(eval-after-load 'simple-httpd
                 '(push '("c" . "text/plain") httpd-mime-types))

This was all corrected in Emacs 25. If the second argument to eval-after-load is a function — the result of applying functionp is non-nil — then it uses funcall. There’s also a new macro, with-eval-after-load, to package it all up nicely.

;;; Better (Emacs >= 25 only)
(eval-after-load 'simple-httpd
  (lambda ()
    (push '("c" . "text/plain") httpd-mime-types)))

;;; Best (Emacs >= 25 only)
(with-eval-after-load 'simple-httpd
  (push '("c" . "text/plain") httpd-mime-types))

Though in both of these examples the compiler will likely warn about httpd-mime-types not being defined. That’s a problem for another day.

A workaround

But what if you need to use Emacs 24, as was the situation that sparked this article? What can we do with the bad version of eval-after-load? We could situate a lambda such that it’s evaluated, but then smuggle the resulting function object into the form passed to eval-after-load, all using a backquote.

;;; Note: this is subtly broken
(eval-after-load 'simple-httpd
  `(funcall
    ,(lambda ()
       (push '("c" . "text/plain") httpd-mime-types)))

When everything is compiled, the backquoted form evalutes to this:

(funcall #[0  [httpd-mime-types ("c" . "text/plain")] 2])

Where the second value (#[...]) is a byte-code object. However, as the comment notes, this is subtly broken. A cleaner and correct way to solve all this is with a named function. The damage caused by eval-after-load will have been (mostly) minimized.

(defun my-simple-httpd-hook ()
  (push '("c" . "text/plain") httpd-mime-types))

(eval-after-load 'simple-httpd
  '(funcall #'my-simple-httpd-hook))

But, let’s go back to the anonymous function solution. What was broken about it? It all has to do with evaluating function objects.

Evaluating function objects

So what happens when we evaluate an expression like the one above with eval? Here’s what it looks like again.

(funcall #[...])

First, eval notices it’s been given a non-empty list, so it’s probably a function call. The first argument is the name of the function to be called (funcall) and the remaining elements are its arguments. But each of these elements must be evaluated first, and the result of that evaluation becomes the arguments.

Any value that isn’t a list or a symbol is self-evaluating. That is, it evaluates to its own value:

(eval 10)
;; => 10

If the value is a symbol, it’s treated as a variable. If the value is a list, it goes through the function call process I’m describing (or one of a number of other special cases, such as macro expansion, lambda expressions, and special forms).

So, conceptually eval recurses on the function object #[...]. A function object is not a list or a symbol, so it’s self-evaluating. No problem.

;; Byte-code objects are self-evaluating

(let ((x (byte-compile (lambda ()))))
  (eq x (eval x)))
;; => t

What if this code wasn’t compiled? Rather than a byte-code object, we’d have some other kind of function object for the interpreter. Let’s examine the dynamic scope (shudder) case. Here, a lambda appears to evaluate to itself, but appearances can be deceiving:

(eval (lambda ())
;; => (lambda ())

However, this is not self-evaluation. Lambda expressions are not self-evaluating. It’s merely coincidence that the result of evaluating a lambda expression looks like the original expression. This is just how the Emacs Lisp interpreter is currently implemented and, strictly speaking, it’s an implementation detail that just so happens to be mostly compatible with byte-code objects being self-evaluating. It would be a mistake to rely on this.

Instead, dynamic scope lambda expression evaluation is idempotent. Applying eval to the result will return an equal, but not identical (eq), expression. In contrast, a self-evaluating value is also idempotent under evaluation, but with eq results.

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x)))
;; => nil

;; Evaluation is idempotent:

(let ((x '(lambda ())))
  (equal x (eval x)))
;; => t

(let ((x '(lambda ())))
  (equal x (eval (eval x))))
;; => t

So, with dynamic scope, the subtly broken backquote example will still work, but only by sheer luck. Under lexical scope, the situation isn’t so lucky:

;;; -*- lexical-scope: t; -*-

(lambda ())
;; => (closure (t) nil)

These interpreted lambda functions are neither self-evaluating nor idempotent. Passing t as the second argument to eval tells it to use lexical scope, as shown below:

;; Not self-evaluating:

(let ((x '(lambda ())))
  (eq x (eval x t)))
;; => nil

;; Not idempotent:

(let ((x '(lambda ())))
  (equal x (eval x t)))
;; => nil

(let ((x '(lambda ())))
  (equal x (eval (eval x t) t)))
;; error: (void-function closure)

I can imagine an implementation of Emacs Lisp where dynamic scope lambda expressions are in the same boat, where they’re not even idempotent. For example:

;;; -*- lexical-binding: nil; -*-

(lambda ())
;; => (totally-not-a-closure ())

Most Emacs Lisp would work just fine under this change, and only code that makes some kind of logical mistake — where there’s nested evaluation of lambda expressions — would break. This essentially already happened when lots of code was quietly switched over to lexical scope after Emacs 24. Lambda idempotency was lost and well-written code didn’t notice.

There’s a temptation here for Emacs to define a closure function or special form that would allow interpreter closure objects to be either self-evaluating or idempotent. This would be a mistake. It would only serve as a hack that covers up logical mistakes that lead to nested evaluation. Much better to catch those problems early.

Solving the problem with one character

So how do we fix the subtly broken example? With a strategically placed quote right before the comma.

(eval-after-load 'simple-httpd
  `(funcall
    ',(lambda ()
        (push '("c" . "text/plain") httpd-mime-types)))

So the form passed to eval-after-load becomes:

;; Compiled:
(funcall (quote #[...]))

;; Dynamic scope:
(funcall (quote (lambda () ...)))

;; Lexical scope:
(funcall (quote (closure (t) () ...)))

The quote prevents eval from evaluating the function object, which would be either needless or harmful. There’s also an argument to be made that this is a perfect situation for a sharp-quote (#'), which exists to quote functions.

What's in an Emacs Lambda

2017-12-14T18:18:57Z

There was recently some interesting discussion about correctly using backquotes to express a mixture of data and code. Since lambda expressions seem to evaluate to themselves, what’s the difference? For example, an association list of operations:

'((add . (lambda (a b) (+ a b)))
  (sub . (lambda (a b) (- a b)))
  (mul . (lambda (a b) (* a b)))
  (div . (lambda (a b) (/ a b))))

It looks like it would work, and indeed it does work in this case. However, there are good reasons to actually evaluate those lambda expressions. Eventually invoking the lambda expressions in the quoted form above are equivalent to using eval. So, instead, prefer the backquote form:

`((add . ,(lambda (a b) (+ a b)))
  (sub . ,(lambda (a b) (- a b)))
  (mul . ,(lambda (a b) (* a b)))
  (div . ,(lambda (a b) (/ a b))))

There are a lot of interesting things to say about this, but let’s first reduce it to two very simple cases:

(lambda (x) x)

'(lambda (x) x)

What’s the difference between these two forms? The first is a lambda expression, and it evaluates to a function object. The other is a quoted list that looks like a lambda expression, and it evaluates to a list — a piece of data.

A naive evaluation of these expressions in *scratch* (C-x C-e) suggests they are are identical, and so it would seem that quoting a lambda expression doesn’t really matter:

(lambda (x) x)
;; => (lambda (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

However, there are two common situations where this is not the case: byte compilation and lexical scope.

Lambda under byte compilation

It’s a little trickier to evaluate these forms byte compiled in the scratch buffer since that doesn’t happen automatically. But if it did, it would look like this:

;;; -*- lexical-binding: nil; -*-

(lambda (x) x)
;; => #[(x) "\010\207" [x] 1]

'(lambda (x) x)
;; => (lambda (x) x)

The #[...] is the syntax for a byte-code function object. As discussed in detail in my byte-code internals article, it’s a special vector object that contains byte-code, and other metadata, for evaluation by Emacs’ virtual stack machine. Elisp is one of very few languages with readable function objects, and this feature is core to its ahead-of-time byte compilation.

The quote, by definition, prevents evaluation, and so inhibits byte compilation of the lambda expression. It’s vital that the byte compiler does not try to guess the programmer’s intent and compile the expression anyway, since that would interfere with lists that just so happen to look like lambda expressions — i.e. any list containing the lambda symbol.

There are three reasons you want your lambda expressions to get byte compiled:

Byte-compiled functions are significantly faster. That’s the main purpose for byte compilation after all.
The compiler performs static checks, producing warnings and errors ahead of time. This lets you spot certain classes of problems before they occur. The static analysis is even better under lexical scope due to its tighter semantics.
Under lexical scope, byte-compiled closures may use less memory. More specifically, they won’t accidentally keep objects alive longer than necessary. I’ve never seen a name for this implementation issue, but I call it overcapturing. More on this later.

While it’s common for personal configurations to skip byte compilation, Elisp should still generally be written as if it were going to be byte compiled. General rule of thumb: Ensure your lambda expressions are actually evaluated.

Lambda in lexical scope

As I’ve stressed many times, you should always use lexical scope. There’s no practical disadvantage or trade-off involved. Just do it.

Once lexical scope is enabled, the two expressions diverge even without byte compilation:

;;; -*- lexical-binding: t; -*-

(lambda (x) x)
;; => (closure (t) (x) x)

'(lambda (x) x)
;; => (lambda (x) x)

Under lexical scope, lambda expressions evaluate to closures. Closures capture their lexical environment in their closure object — nothing in this particular case. It’s a type of function object, making it a valid first argument to funcall.

Since the quote prevents the second expression from being evaluated, semantically it evaluates to a list that just so happens to look like a (non-closure) function object. Invoking a data object as a function is like using eval — i.e. executing data as code. Everyone already knows eval should not be used lightly.

It’s a little more interesting to look at a closure that actually captures a variable, so here’s a definition for constantly, a higher-order function that returns a closure that accepts any number of arguments and returns a particular constant:

(defun constantly (x)
  (lambda (&rest _) x))

Without byte compiling it, here’s an example of its return value:

(constantly :foo)
;; => (closure ((x . :foo) t) (&rest _) x)

The environment has been captured as an association list (with a trailing t), and we can plainly see that the variable x is bound to the symbol :foo in this closure. Consider that we could manipulate this data structure (e.g. setcdr or setf) to change the binding of x for this closure. This is essentially how closures mutate their own environment. Moreover, closures from the same environment share structure, so such mutations are also shared. More on this later.

Semantically, closures are distinct objects (via eq), even if the variables they close over are bound to the same value. This is because they each have a distinct environment attached to them, even if in some invisible way.

(eq (constantly :foo) (constantly :foo))
;; => nil

Without byte compilation, this is true even when there’s no lexical environment to capture:

(defun dummy ()
  (lambda () t))

(eq (dummy) (dummy))
;; => nil

The byte compiler is smart, though. As an optimization, the same closure object is reused when possible, avoiding unnecessary work, including multiple object allocations. Though this is a bit of an abstraction leak. A function can (ab)use this to introspect whether it’s been byte compiled:

(defun have-i-been-compiled-p ()
  (let ((funcs (vector nil nil)))
    (dotimes (i 2)
      (setf (aref funcs i) (lambda ())))
    (eq (aref funcs 0) (aref funcs 1))))

(have-i-been-compiled-p)
;; => nil

(byte-compile 'have-i-been-compiled-p)

(have-i-been-compiled-p)
;; => t

The trick here is to evaluate the exact same non-capturing lambda expression twice, which requires a loop (or at least some sort of branch). Semantically we should think of these closures as being distinct objects, but, if we squint our eyes a bit, we can see the effects of the behind-the-scenes optimization.

Don’t actually do this in practice, of course. That’s what byte-code-function-p is for, which won’t rely on a subtle implementation detail.

Overcapturing

I mentioned before that one of the potential gotchas of not byte compiling your lambda expressions is overcapturing closure variables in the interpreter.

To evaluate lisp code, Emacs has both an interpreter and a virtual machine. The interpreter evaluates code in list form: cons cells, numbers, symbols, etc. The byte compiler is like the interpreter, but instead of directly executing those forms, it emits byte-code that, when evaluated by the virtual machine, produces identical visible results to the interpreter — in theory.

What this means is that Emacs contains two different implementations of Emacs Lisp, one in the interpreter and one in the byte compiler. The Emacs developers have been maintaining and expanding these implementations side-by-side for decades. A pitfall to this approach is that the implementations can, and do, diverge in their behavior. We saw this above with that introspective function, and it comes up in practice with advice.

Another way they diverge is in closure variable capture. For example:

;;; -*- lexical-binding: t; -*-

(defun overcapture (x y)
  (when y
    (lambda () x)))

(overcapture :x :some-big-value)
;; => (closure ((y . :some-big-value) (x . :x) t) nil x)

Notice that the closure captured y even though it’s unnecessary. This is because the interpreter doesn’t, and shouldn’t, take the time to analyze the body of the lambda to determine which variables should be captured. That would need to happen at run-time each time the lambda is evaluated, which would make the interpreter much slower. Overcapturing can get pretty messy if macros are introducing their own hidden variables.

On the other hand, the byte compiler can do this analysis just once at compile-time. And it’s already doing the analysis as part of its job. It can avoid this problem easily:

(overcapture :x :some-big-value)
;; => #[0 "\300\207" [:x] 1]

It’s clear that :some-big-value isn’t present in the closure.

But… how does this work?

How byte compiled closures are constructed

Recall from the internals article that the four core elements of a byte-code function object are:

Parameter specification
Byte-code string (opcodes)
Constants vector
Maximum stack usage

While a closure seems like compiling a whole new function each time the lambda expression is evaluated, there’s actually not that much to it! Namely, the behavior of the function remains the same. Only the closed-over environment changes.

What this means is that closures produced by a common lambda expression can all share the same byte-code string (second element). Their bodies are identical, so they compile to the same byte-code. Where they differ are in their constants vector (third element), which gets filled out according to the closed over environment. It’s clear just from examining the outputs:

(constantly :a)
;; => #[128 "\300\207" [:a] 2]

(constantly :b)
;; => #[128 "\300\207" [:b] 2]

constantly has three of the four components of the closure in its own constant pool. Its job is to construct the constants vector, and then assemble the whole thing into a byte-code function object (#[...]). Here it is with M-x disassemble:

     constant  make-byte-code
     constant  128
     constant  "\300\207"
     constant  vector
     stack-ref 4
     call      1
     constant  2
     call      4
     return

(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve simplified it for this discussion.)

It pushes most of its constants on the stack. Then the stack-ref 5 (5) puts x on the stack. Then it calls vector to create the constants vector (6). Finally, it constructs the function object (#[...]) by calling make-byte-code (8).

Since this might be clearer, here’s the same thing expressed back in terms of Elisp:

(defun constantly (x)
  (make-byte-code 128 "\300\207" (vector x) 2))

To see the disassembly of the closure’s byte-code:

(disassemble (constantly :x))

The result isn’t very surprising:

0       constant  :x
1       return

Things get a little more interesting when mutation is involved. Consider this adder closure generator, which mutates its environment every time it’s called:

(defun adder ()
  (let ((total 0))
    (lambda () (cl-incf total))))

(let ((count (adder)))
  (funcall count)
  (funcall count)
  (funcall count))
;; => 3

(adder)
;; => #[0 "\300\211\242T\240\207" [(0)] 2]

The adder essentially works like this:

(defun adder ()
  (make-byte-code 0 "\300\211\242T\240\207" (vector (list 0)) 2))

In theory, this closure could operate by mutating its constants vector directly. But that wouldn’t be much of a constants vector, now would it!? Instead, mutated variables are boxed inside a cons cell. Closures don’t share constant vectors, so the main reason for boxing is to share variables between closures from the same environment. That is, they have the same cons in each of their constant vectors.

There’s no equivalent Elisp for the closure in adder, so here’s the disassembly:

     constant  (0)
     dup
     car-safe
     add1
     setcar
     return

It puts two references to boxed integer on the stack (constant, dup), unboxes the top one (car-safe), increments that unboxed integer, stores it back in the box (setcar) via the bottom reference, leaving the incremented value behind to be returned.

This all gets a little more interesting when closures interact:

(defun fancy-adder ()
  (let ((total 0))
    `(:add ,(lambda () (cl-incf total))
      :set ,(lambda (v) (setf total v))
      :get ,(lambda () total))))

(let ((counter (fancy-adder)))
  (funcall (plist-get counter :set) 100)
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :add))
  (funcall (plist-get counter :get)))
;; => 102

(fancy-adder)
;; => (:add #[0 "\300\211\242T\240\207" [(0)] 2]
;;     :set #[257 "\300\001\240\207" [(0)] 3]
;;     :get #[0 "\300\242\207" [(0)] 1])

This is starting to resemble object oriented programming, with methods acting upon fields stored in a common, closed-over environment.

All three closures share a common variable, total. Since I didn’t use print-circle, this isn’t obvious from the last result, but each of those (0) conses are the same object. When one closure mutates the box, they all see the change. Here’s essentially how fancy-adder is transformed by the byte compiler:

(defun fancy-adder ()
  (let ((box (list 0)))
    (list :add (make-byte-code 0 "\300\211\242T\240\207" (vector box) 2)
          :set (make-byte-code 257 "\300\001\240\207" (vector box) 3)
          :get (make-byte-code 0 "\300\242\207" (vector box) 1))))

The backquote in the original fancy-adder brings this article full circle. This final example wouldn’t work correctly if those lambdas weren’t evaluated properly.

The Adversarial Implementation

2017-05-03T17:51:53Z

When coding against a standard, whether it’s a programming language specification or an open API with multiple vendors, a common concern is the conformity of a particular construct to the standard. This cannot be determined simply by experimentation, since a piece of code may work correctly due only to the specifics of a particular implementation. It works today with this implementation, but it may not work tomorrow or with a different implementation. Sometimes an implementation will warn about the use of non-standard behavior, but this isn’t always the case.

When I’m reasoning about whether or not something is allowed, I like to imagine an adversarial implementation. If the standard allows some freedom, this implementation takes an imaginative or unique approach. It chooses non-obvious interpretations with possibly unexpected, but valid, results. This is nearly the opposite of djb’s hypothetical boringcc, though some of the ideas are similar.

Many argue that this is already the case with modern C and C++ optimizing compilers. Compiler writers are already creative with the standard in order to squeeze out more performance, even if it’s at odds with the programmer’s actual intentions. The most prominent example in C and C++ is strict aliasing, where the optimizer is deliberately blinded to certain kinds of aliasing because the standard allows it to be, eliminating some (possibly important) loads. This happens despite the compiler’s ability to trivially prove that two particular objects really do alias.

I want to be clear that I’m not talking about the nasal daemon kind of creativity. That’s not a helpful thought experiment. What I mean is this: Can I imagine a conforming implementation that breaks any assumptions made by the code?

In practice, compilers typically have to bridge multiple specifications: the language standard, the platform ABI, and operating system interface (process startup, syscalls, etc.). This really ties its hands on how creative it can be with any one of the specifications. Depending on the situation, the imaginary adversarial implementation isn’t necessarily running on any particular platform. If our program is expected to have a long life, useful for many years to come, we should avoid making too many assumptions about future computers and imagine an adversarial compiler with few limitations.

C example

Take this bit of C:

printf("%d", sizeof(foo));

The printf function is variadic, and it relies entirely on the format string in order to correctly handle all its arguments. The %d specifier means that its matching argument is of type int. The result of the sizeof operator is an integer of type size_t, which has a different sign and may even be a different size.

Typically this code will work just fine. An int and size_t are generally passed the same way, the actual value probably fits in an int, and two’s complement means the signedness isn’t an issue due to the value being positive. From the printf point of view, it typically can’t detect that the type is wrong, so everything works by chance. In fact, it’s hard to imagine a real situation where this wouldn’t work fine.

However, this still undefined behavior — a scenario where a creative adversarial implementation can break things. In this case there are a few options for an adversarial implementation:

Arguments of type int and size_t are passed differently, so printf will load the argument it from the wrong place.
The implementation doesn’t use two’s complement and even small positive values have different bit representations.
The type of foo is given crazy padding for arbitrary reasons that makes it so large it doesn’t fit in an int.

What’s interesting about #1 is that this has actually happened. For example, here’s a C source file.

float foo(float x, int y);

float
bar(int y)
{
    return foo(0.0f, y);
}

And in another source file:

float
foo(int x, int y)
{
    (void)x;  // ignore x
    return y * 2.0f;
}

The type of argument x differs between the prototype and the definition, which is undefined behavior. However, since this argument is ignored, this code will still work correctly on many different real-world computers, particularly where float and int arguments are passed the same way (i.e. on the stack).

However, in 2003 the x86-64 CPU arrived with its new System V ABI. Floating point and integer arguments were now passed differently, and the types of preceding arguments mattered when deciding which register to use. Some constructs that worked fine, by chance, prior to 2003 would soon stop working due to what may have seemed like an adversarial implementation years before.

Python example

Let’s look at some Python. This snippet opens a file a million times without closing any handles.

for i in range(1, 1000000):
    f = open("/dev/null", "r")

Assuming you have a /dev/null, this code will work fine without throwing any exceptions on CPython, the most widely used Python implementation. CPython uses a deterministic reference counting scheme, and the handle is automatically closed as soon as its variable falls out of scope. It’s like having an invisible f.close() at the end of the block.

However, this code is incorrect. The deterministic handle closing an implementation behavior, not part of the specification. The operating system limits the number of files a process can have open at once, and there’s a risk that this resource will run out even though none of those handles are reachable. Imagine an adversarial Python implementation trying to break this code. It could sufficiently delay garbage collection, or even have infinite memory, omitting garbage collection altogether.

Like before, such an implementation eventually did come about: PyPy, a Python implementation written in Python with a JIT compiler. It uses (by default) something closer to mark-and-sweep, not reference counting, and those handles are left open until the next collection.

>>>> for i in range(1, 1000000):
....     f = open("/dev/null", "r")
.... 
Traceback (most recent call last):
  File "", line 2, in 
IOError: [Errno 24] Too many open files: '/dev/null'

A tool for understanding specifications

This fits right in with a broader method of self-improvement: Occasionally put yourself in the implementor’s shoes. Think about what it would take to correctly implement the code that you write, either as a language or the APIs that you call. On reflection, you may find that some of those things that seem cheap may not be. Your assumptions may be reasonable, but not guaranteed. (Though it may be that “reasonable” is perfectly sufficient for your situation.)

An adversarial implementation is one that challenges an assumption you’ve taken for granted by turning it on its head.

The Vulgarness of Abbreviated Function Templates

2016-10-02T23:59:59Z

The auto keyword has been a part of C and C++ since the very beginning, originally as a one of the four storage class specifiers: auto, register, static, and extern. An auto variable has “automatic storage duration,” meaning it is automatically allocated at the beginning of its scope and deallocated at the end. It’s the default storage class for any variable without external linkage or without static storage, so the vast majority of variables in a typical C program are automatic.

In C and C++ prior to C++11, the following definitions are equivalent because the auto is implied.

int
square(int x)
{
    int x2 = x * x;
    return x2;
}

int
square(int x)
{
    auto int x2 = x * x;
    return x2;
}

As a holdover from really old school C, unspecified types in C are implicitly int, and even today you can get away with weird stuff like this:

/* C only */
square(x)
{
    auto x2 = x * x;
    return x2;
}

By “get away with” I mean in terms of the compiler accepting this as valid input. Your co-workers, on the other hand, may become violent.

Like register, as a storage class auto is an historical artifact without direct practical use in modern code. However, as a concept it’s indispensable for the specification. In practice, automatic storage means the variables lives on “the” stack (or one of the stacks), but the specifications make no mention of a stack. In fact, the word “stack” doesn’t appear even once. Instead it’s all described in terms of “automatic storage,” rightfully leaving the details to the implementations. A stack is the most sensible approach the vast majority of the time, particularly because it’s both thread-safe and re-entrant.

C++11 Type Inference

One of the major changes in C++11 was repurposing the auto keyword, moving it from a storage class specifier to a a type specifier. In C++11, the compiler infers the type of an auto variable from its initializer. In C++14, it’s also permitted for a function’s return type, inferred from the return statement.

This new specifier is very useful in idiomatic C++ with its ridiculously complex types. Transient variables, such as variables bound to iterators in a loop, don’t need a redundant type specification. It keeps code DRY (“Don’t Repeat Yourself”). Also, templates easier to write, since it makes the compiler do more of the work. The necessary type information is already semantically present, and the compiler is a lot better at dealing with it.

With this change, the following is valid in both C and C++11, and, by sheer coincidence, has the same meaning, but for entirely different reasons.

int
square(int x)
{
    auto x2 = x * x;
    return x2;
}

In C the type is implied as int, and in C++11 the type is inferred from the type of x * x, which, in this case, is int. The prior example with auto int x2, valid in C++98 and C++03, is no longer valid in C++11 since auto and int are redundant type specifiers.

Occasionally I wish I had something like auto in C. If I’m writing a for loop from 0 to n, I’d like the loop variable to be the same type as n, even if I decide to change the type of n in the future. For example,

struct foo *foo = foo_create();
for (int i = 0; i < foo->n; i++)
    /* ... */;

The loop variable i should be the same type as foo->n. If I decide to change the type of foo->n in the struct definition, I’d have to find and update every loop. The idiomatic C solution is to typedef the integer, using the new type both in the struct and in loops, but I don’t think that’s much better.

Abbreviated Function Templates

Why is all this important? Well, I was recently reviewing some C++ and came across this odd specimen. I’d never seen anything like it before. Notice the use of auto for the parameter types.

void
set_odd(auto first, auto last, const auto &x)
{
    bool toggle = false;
    for (; first != last; first++, toggle = !toggle)
        if (toggle)
            *first = x;
}

Given the other uses of auto as a type specifier, this kind of makes sense, right? The compiler infers the type from the input argument. But, as you should often do, put yourself in the compiler’s shoes for a moment. Given this function definition in isolation, can you generate any code? Nope. The compiler needs to see the call site before it can infer the type. Even more, different call sites may use different types. That sounds an awful lot like a template, eh?

template<typename T, typename V>
void
set_odd(T first, T last, const V &x)
{
    bool toggle = false;
    for (; first != last; first++, toggle = !toggle)
        if (toggle)
            *first = x;
}

This is a proposed feature called abbreviated function templates, part of C++ Extensions for Concepts. It’s intended to be shorthand for the template version of the function. GCC 4.9 implements it as an extension, which is why the author was unaware of its unofficial status. In March 2016 it was established that abbreviated function templates would not be part of C++17, but may still appear in a future revision.

Personally, I find this use of auto to be vulgar. It overloads the keyword with a third definition. This isn’t unheard of — static also serves a number of unrelated purposes — but while similar to the second form of auto (type inference), this proposed third form is very different in its semantics (far more complex) and overhead (potentially very costly). I’m glad it’s been rejected so far. Templates better reflect the nature of this sort of code.

Makefile Assignments are Turing-Complete

2016-04-30T03:01:22Z

For over a decade now, GNU Make has almost exclusively been my build system of choice, either directly or indirectly. Unfortunately this means I unnecessarily depend on some GNU extensions — an annoyance when porting to the BSDs. In an effort to increase the portability of my Makefiles, I recently read the POSIX make specification. I learned two important things: 1) ~~POSIX make is so barren it’s not really worth striving for~~ (update: I’ve changed my mind), and 2) make’s macro assignment mechanism is Turing-complete.

If you want to see it in action for yourself before reading further, here’s a Makefile that implements Conway’s Game of Life (40x40) using only macro assignments.

life.mak (174kB) [or generate your own]

Run it with any make program in an ANSI terminal. It must literally be named life.mak. Beware: if you run it longer than a few minutes, your computer may begin thrashing.

make -f life.mak

It’s 100% POSIX-compatible except for the sleep 0.1 (fractional sleep), which is only needed for visual effect.

A POSIX workaround

Unlike virtually every real world implementation, POSIX make doesn’t support conditional parts. For example, you might want your Makefile’s behavior to change depending on the value of certain variables. In GNU Make it looks like this:

ifdef USE_FOO
    EXTRA_FLAGS = -ffoo -lfoo
else
    EXTRA_FLAGS = -Wbar
endif

Or BSD-style:

.ifdef USE_FOO
    EXTRA_FLAGS = -ffoo -lfoo
.else
    EXTRA_FLAGS = -Wbar
.endif

If the goal is to write a strictly POSIX Makefile, how could I work around the lack of conditional parts and maintain a similar interface? The selection of macro/variable to evaluate can be dynamically selected, allowing for some useful tricks. First define the option’s default:

USE_FOO = 0

Then define both sets of flags:

EXTRA_FLAGS_0 = -Wbar
EXTRA_FLAGS_1 = -ffoo -lfoo

Now dynamically select one of these macros for assignment to EXTRA_FLAGS.

EXTRA_FLAGS = $(EXTRA_FLAGS_$(USE_FOO))

The assignment on the command line overrides the assignment in the Makefile, so the user gets to override USE_FOO.

$ make              # EXTRA_FLAGS = -Wbar
$ make USE_FOO=0    # EXTRA_FLAGS = -Wbar
$ make USE_FOO=1    # EXTRA_FLAGS = -ffoo -lfoo

Before reading the POSIX specification, I didn’t realize that the left side of an assignment can get the same treatment. For example, if I really want the “if defined” behavior back, I can use the macro to mangle the left-hand side. For example,

EXTRA_FLAGS = -O0 -g3
EXTRA_FLAGS$(DEBUG) = -O3 -DNDEBUG

Caveat: If DEBUG is set to empty, it may still result in true for ifdef depending on which make flavor you’re using, but will always appear to be unset in this hack.

$ make             # EXTRA_FLAGS = -O3 -DNDEBUG
$ make DEBUG=yes   # EXTRA_FLAGS = -O0 -g3

This last case had me thinking: This is very similar to the (ab)use of the x86 mov instruction in mov is Turing-complete. These macro assignments alone should be enough to compute any algorithm.

Macro Operations

Macro names are just keys to a global associative array. This can be used to build lookup tables. Here’s a Makefile to “compute” the square root of integers between 0 and 10.

sqrt_0  = 0.000000
sqrt_1  = 1.000000
sqrt_2  = 1.414214
sqrt_3  = 1.732051
sqrt_4  = 2.000000
sqrt_5  = 2.236068
sqrt_6  = 2.449490
sqrt_7  = 2.645751
sqrt_8  = 2.828427
sqrt_9  = 3.000000
sqrt_10 = 3.162278
result := $(sqrt_$(n))

The BSD flavors of make have a -V option for printing variables, which is an easy way to retrieve output. I used an “immediate” assignment (:=) for result since some versions of make won’t evaluate the expression before -V printing.

$ make -f sqrt.mak -V result n=8
2.828427

Without -V, a default target could be used instead:

output :
        @printf "$(result)\n"

There are no math operators, so performing arithmetic requires some creativity. For example, integers could be represented as a series of x characters. The number 4 is xxxx, the number 6 is xxxxxx, etc. Addition is concatenation (note: macros can have + in their names):

A      = xxx
B      = xxxx
A+B    = $(A)$(B)

However, since there’s no way to “slice” a value, subtraction isn’t possible. A more realistic approach to arithmetic would require lookup tables.

Branching

Branching could be achieved through more lookup tables. For example,

square_0  = 1
square_1  = 2
square_2  = 4
# ...
result := $($(op)_$(n))

And called as:

$ make n=5 op=sqrt    # 2.236068
$ make n=5 op=square  # 25

Or using the DEBUG trick above, use the condition to mask out the results of the unwanted branch. This is similar to the mov paper.

result           := $(op)($(n)) = $($(op)_$(n))
result$(verbose) := $($(op)_$(n))

And its usage:

$ make n=5 op=square             # 25
$ make n=5 op=square verbose=1   # square(5) = 25

What about loops?

Looping is a tricky problem. However, one of the most common build (anti?)patterns is the recursive Makefile. Borrowing from the mov paper, which used an unconditional jump to restart the program from the beginning, for a Makefile Turing-completeness I can invoke the Makefile recursively, restarting the program with a new set of inputs.

Remember the print target above? I can loop by invoking make again with new inputs in this target,

output :
    @printf "$(result)\n"
    @$(MAKE) $(args)

Before going any further, now that loops have been added, the natural next question is halting. In reality, the operating system will take care of that after some millions of make processes have carelessly been invoked by this horribly inefficient scheme. However, we can do better. The program can clobber the MAKE variable when it’s ready to halt. Let’s formalize it.

loop = $(MAKE) $(args)
output :
    @printf "$(result)\n"
    @$(loop)

To halt, the program just needs to clear loop.

Suppose we want to count down to 0. There will be an initial count:

count = 6

A decrement table:

= 5
= 4
= 3
= 2
= 1
= 0
= loop

The last line will be used to halt by clearing the name on the right side. This is three star territory.

$($($(count))) =

The result (current iteration) loop value is computed from the lookup table.

result = $($(count))

The next loop value is passed via args. If loop was cleared above, this result will be discarded.

args = count=$(result)

With all that in place, invoking the Makefile will print a countdown from 5 to 0 and quit. This is the general structure for the Game of Life macro program.

Game of Life

A universal Turing machine has been implemented in Conway’s Game of Life. With all that heavy lifting done, one of the easiest methods today to prove a language’s Turing-completeness is to implement Conway’s Game of Life. Ignoring the criminal inefficiency of it, the Game of Life Turing machine could be run on the Game of Life simulation running on make’s macro assignments.

In the Game of Life program — the one linked at the top of this article — each cell is stored in a macro named xxyy, after its position. The top-left most cell is named 0000, then going left to right, 0100, 0200, etc. Providing input is a matter of assigning each of these macros. I chose X for alive and - for dead, but, as you’ll see, any two characters permitted in macro names would work as well.

$ make 0000=X 0100=- 0200=- 0300=X ...

The next part should be no surprise: The rules of the Game of Life are encoded as a 512-entry lookup table. The key is formed by concatenating the cell’s value along with all its neighbors, with itself in the center.

The “beginning” of the table looks like this:

--------- = -
X-------- = -
-X------- = -
XX------- = -
--X------ = -
X-X------ = -
-XX------ = -
XXX------ = X
---X----- = -
X--X----- = -
-X-X----- = -
XX-X----- = X
# ...

Note: The two right-hand X values here are the cell coming to life (exactly three living neighbors). Computing the next value (n0101) for 0101 is done like so:

n0101 = $($(0000)$(0100)$(0200)$(0001)$(0101)$(0201)$(0002)$(0102)$(0202))

Given these results, constructing the input to the next loop is simple:

args = 0000=$(n0000) 0100=$(n0100) 0200=$(n0200) ...

The display output, to be given to printf, is built similarly:

output = $(n0000)$(n0100)$(n0200)$(n0300)...

In the real version, this is decorated with an ANSI escape code that clears the terminal. The printf interprets the escape byte (\033) so that it doesn’t need to appear literally in the source.

And that’s all there is to it: Conway’s Game of Life running in a Makefile. Life, uh, finds a way.

Per Loop vs. Per Iteration Bindings

2014-06-06T20:18:58Z

The April 5th, 2014 draft of the ECMA-262 6th Edition specification — a.k.a the next major version of JavaScript/ECMAScript — contained a subtle, though very significant, change to the semantics of the for loop (13.6.3.3). Loop variables are now fresh bindings for each iteration of the loop: a per-iteration binding. Previously loop variables were established once for the entire loop, a per-loop binding. The purpose is an attempt to fix an old gotcha that effects many languages.

If you couldn’t already tell, this is going to be another language lawyer post!

Backup to C

To try to explain what this all means this in plain English, let’s step back a moment and discuss what a for loop really is. I can’t find a source for this, but I’m pretty confident the three-part for loop originated in K&R C.

for (INITIALIZATION; CONDITION; ITERATION) {
    BODY;
}

Evaluate INITIALIZATION.
Evaluate CONDITION. If zero (false), exit the for.
Evaluate BODY.
Evaluate ITERATION and go to 2.

In the original C, and all the way up to C89, no variable declarations were allowed in the initialization expression. I can understand why: there’s a subtle complication, though it’s harmless in C. We’ll get to that soon. Here’s a typical C89 for loop.

int count = 10;
/* ... */
int i;
for (i = 0; i < count; i++) {
    double foo;
    /* ... */
}

The variable i is established independent of the loop, in the scope outside the for loop, alongside count. This isn’t even a per-loop binding. As far as the language is concerned, it’s just a variable that the loop happens to access and mutate. It’s very assembly-language-like. Because C has block scoping, the body of the for loop is another nested scope. The variable foo is in this scope, reestablished on each iteration of the loop (per-iteration).

As an implementation detail, foo will reside at the same location on the stack each time around the loop. If it’s accessed before being initialized, it will probably hold the value from the previous iteration, but, as far as the language is concerned, this is just a happy, though undefined, coincidence.

C99 Loops

Fast forward to the end of the 20th century. At this point, other languages have allowed variable declarations in the initialization part for years, so it’s time for C to catch up with C99.

int count = 10;
/* ... */
for (int i = 0; i < count; i++) {
    double foo;
    /* ... */
}

Now consider this: in what scope is the variable i? The outer scope as before? The iteration scope with foo? The answer is neither. In order to make this work, a whole new loop scope is established in between: a per-loop binding. This scope holds for the entire duration of the loop.

The variable i is constrained to the for loop without being limited to the iteration scope. This is important because i is what keeps track of the loop’s progress. The semantic equivalent in C89 makes the additional scope explicit with a block.

int count = 10;
/* ... */
{
    int i;
    for (i = 0; i < count; i++) {
        double foo;
        /* ... */
    }
}

This, ladies and gentlemen, is the the C-style 3-part for loop. Every language that has this statement, and has block scope, follows these semantics. This included JavaScript up until two months ago, where the draft now gives it its own unique behavior.

JavaScript’s Let

As it exists today in its practical form, little of the above is relevant to JavaScript. JavaScript has no block scope, just function scope. A three-part for-loop doesn’t establish all these scopes, because scopes like these are absent from the language.

An important change coming with 6th edition is the introduction of let declarations. Variables declared with let will have block scope.

let count = 10;
// ...
for(let i = 0; i < count; i++) {
    let foo;
    // ...
}
console.log(foo); // error
console.log(i);   // error

If these variables had been declared with var, the last two lines wouldn’t be errors (or worse, global references). count, i, and foo would all be in the same function-level scope. This is really great! I look forward to using let exclusively someday.

The Closure Trap

I mentioned a subtle complication. Most of the time programmers don’t need to consider or even be aware of this middle scope. However, when combined with closures it suddenly becomes an issue. Here’s an example with Perl,

my @closures;
for (my $i = 0; $i < 2; $i++) {
    push(@closures, sub { return $i; });
}
$closures[0]();  # => 2
$closures[1]();  # => 2

Here’s one with Python. Python lacks a three-part for loop, but its standard for loop has similar semantics.

closures = []
for i in xrange(2):
    closures.append(lambda: i)
closures[0]()  # => 1
closures[1]()  # => 1

And now Ruby.

closures = []
for i in (0..1)
  closures << lambda { i }
end
closures[0].call  # => 1
closures[1].call  # => 1

In all three cases, one closure is created per iteration. Each closure captures the loop variable i. It’s easy to make the mistake of thinking each closure will return a unique value. However, as pointed out above, this is a per-loop variable, existing in a middle scope. The closures all capture the same variable, merely bound to different values at the time of capture. The solution is to establish a new variable in the iteration scope and capture that instead. Below, I’ve established a $value variable for this.

my @closures;
for (my $i = 0; $i < 2; $i++) {
    my $value = $i;
    push(@closures, sub { return $value; });
}
$closures[0]();  # => 0
$closures[1]();  # => 1

This is something that newbies easily get tripped up on. Because they’re still trying to wrap their heads around the closure concept, this looks like some crazy bug in the interpreter/compiler. I can understand why the ECMA-262 draft was changed to accommodate this situation.

The JavaScript Workaround

The language in the new draft has two items called perIterationBindings and CreatePerIterationEnvironment (in case you’re searching for the relevant part of the spec). Like the $value example above, for loops in JavaScript with “lexical” (i.e. let) loop bindings will implicitly mask the loop variable with a variable of the same name in the iteration scope.

let closures = [];
for (let i = 0; i < 2; i++) {
    closures.push(function() { return i; });
}

/* Before the change: */
closures[0]();  // => 2
closures[1]();  // => 2

/* After the change: */
closures[0]();  // => 0
closures[1]();  // => 1

Note: If you try to run this yourself, note that at the time of this writing, the only JavaScript implementation I could find that updated to the latest draft was Traceur. You’ll probably see the “before” behavior for now.

You can’t see it (I said it’s implicit!), but under an updated JavaScript implementation there are actually two i variables here. The closures capture the most inner i, the per-iteration version of i. Let’s go back to the original example, JavaScript-style.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
   let foo;
   // ...
}

Here’s what the scope looks like for the latest draft. Notice the second i in the iteration scope. The inner i is initially assigned to the value of the outer i.

We could emulate this in an older edition. Imagine writing a macro to do this.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
    let __i = i;  // (possible name collision)
    {
        let i = __i;
        let foo;
        // ...
    }
}

I have to use __i to smuggle the value across scopes without having i reference itself. Unlike Lisp’s let, the assignment value for var and let is evaluated in the nested scope, not the outer scope.

Each iteration gets its own i. But what happens when the loop modifies i? Simple, it’s copied back out at the end of the body.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
    let __i = i;
    {
        let i = __i;
        let foo;
        // ...
        __i = i;
    }
    i = __i;
}

Now all the expected for semantics work — the body can also update the loop variable — but we still get the closure-friendly per-iteration variables.

Conclusion

I’m still not sure if I really like this change. It’s clean fix, but the gotcha hasn’t been eliminated. Instead it’s been inverted. Sometime someone will have the unusual circumstance of wanting to capture the loop variable, and he will run into some surprising behavior. Because the semantics are a lot more complicated, it’s hard to reason about what’s not working unless you already know JavaScript has magical for loops.

Perl and C# each also gained per-iteration bindings in their history, but rather than complicate or change their standard for loops, they instead introduced it as a new syntactic construction: foreach.

my @closures;
foreach my $i (0, 1) {
    push(@closures, sub { return $i; });
}
$closures[0]();  # => 0
$closures[1]();  # => 1

In this case, per-iteration bindings definitely make sense. The variable $i is established and bound to each value in turn. As far as control flow goes, it’s very functional. The binding is never actually mutated.

I think it could be argued that Python and Ruby’s for ... in forms should behave like this foreach. These were probably misdesigned early on, but it’s not possible to change their semantics at this point. Because JavaScript’s var was improperly designed from the beginning, let offers the opportunity to fix more than just var. We’re seeing this right now with these new for semantics.

Three Dimensions of Type Systems

2014-04-25T22:03:01Z

I occasionally come across articles, and even some books, that get terminology mixed up when discussing type systems. The author might say “strong” when what they’re talking about is “lexical.” In this article I’ll define three orthogonal properties of type systems. A new programming language design could select from each category independently.

Static vs Dynamic

Static versus dynamic typing is probably the most obvious type system property when glancing at an example of a language’s source code. This refers to whether or not variables have types.

In a statically typed language, variables have a compile-time determined type. At run-time, a variable will only ever hold a value of this type. Except where type information can be inferred, variable and function declarations are generally accompanied by its type (manifest). Type violations are generally discovered early, at compile-time, at the cost of additional up-front planning.

In a dynamically typed language, only values have types. A variable may be bound to any type of value. Though a smart compiler may reason about a program enough to know that certain variables are only ever bound to a limited set of types. Type violations are generally discovered late, at run-time, but this allows for more ad hoc design.

Statically typed languages: C, C++, Java, C#, Haskell

Dynamically typed languages: Python, Ruby, JavaScript, Lisp, Clojure

To give a quick comparison, here’s the same function definition in C and JavaScript.

double distance(struct point a, struct point b) {
    return sqrt(pow(a.x - b.x, 2) + pow(a.y - b-y, 2));
}

function distance(a, b) {
    return Math.sqrt(Math.pow(a.x - b.x, 2) + Math.pow(a.y - b.y, 2));
}

Dynamic type systems are more apt for duck typing, though there are exceptions such as with C++ templates. In the JavaScript example above, anything that has numeric x and y properties can be passed to distance. The actual type doesn’t matter. In the C version, only that very specific type, struct point, may be passed to distance.

The C example could be made more generic, circumventing its type system through a trick called type punning. This is where a value is accessed by the program as though it was a different type of value. This requires up-front planning and may potentially violate strict aliasing.

Lexical vs. Dynamic Scope

Lexical versus dynamic scope refers not to any values or objects themselves, but rather to how variables are accessed. Virtually all popular programming languages in use today use lexical scope. This is because dynamic scope has serious performance and correctness problems. In fact, it was likely invented entirely by accident.

However, it’s still useful to use dynamic scope on a careful, opt-in process. Perl, Common Lisp, Clojure, and Emacs Lisp all permit selective dynamic scope. It’s a clean method for temporarily masking a global variable, such as the default stream for reading/writing input/output.

Under lexical scope, the scope of a variable is determined statically at compile-time. The compiler knows about all accesses to a particular variable. This is sometimes called static scope but I’m using the word lexical here to help differentiate from static typing (above).

In dynamic scope, all variables are essentially global variables. A new binding masks any existing global binding for any functions called from within that binding’s extent. If a function accesses a free variable, it’s not known until run-time from where that value may come. When the last binding is removed, such as when a local variable’s scope exited, that global variable is then said to be unbound. Dynamic scope is incompatible with closures.

Lexically scoped languages: C, C++, Java, JavaScript, Python, Ruby, (many more)

Dynamically scoped languages: Emacs Lisp, bash

As of Emacs 24, lexical scope can be enabled by default for a file/buffer by setting lexical-binding to true. I imagine this will some day become the default, making Emacs Lisp a lexically scoped language. This is also a perfect example of lexical scope having better performance: turning on lexical scope makes Elisp programs run faster.

Here’s an example of dynamic scope in Emacs Lisp.

(defun reveal ()
  x) ; free variable

(defun foo ()
  (let ((x :foo))
    (reveal)))

(defun bar ()
  (let ((x :bar))
    (reveal)))

(foo)
;; => :foo

(bar)
;; => :bar

The value of x as seen by reveal depends on which function called it, since the binding leaks through. Running the exact same code in Common Lisp, where it’s lexically scoped, would result in a run-time error. It always tries to access x as a global variable. The scope of x is strictly limited to foo or bar.

Strong vs. Weak

Strong versus weak is probably the least understood property. Strong typing is often mixed up with static typing despite being an orthogonal concept. A language can be strongly, dynamically typed (Python) — or weakly, statically typed (C). Strong/weak is also sometimes confused with type safety.

This aspect refers to the tendency of values to be implicitly coerced into another type depending on how they are used. Unlike the previous two type system properties, this one isn’t bimodal. There’s a degree to just how much implicit coercion occurs.

Strongly typed languages: Python, Common Lisp, Java, Ruby

Weakly typed languages: JavaScript, PHP, Perl, C

For example, take the following expression.

"8" - 5

In strongly typed languages this will generally be an error: strings have no definition with the subtraction operator. In weakly typed languages, the “8” is likely to be parsed into a number as part of being handled by the operator, with the expression evaluating to 3.

If a language has a triple equality operator (===), that’s a dead giveaway that it’s weakly typed.

In the case of C, its pointers are what make it weakly typed. It’s easy to make a pointer to a value, then dereference it as a different type (usually leading to undefined behavior).

This is another trade-off between safety and convenience. Modern languages tend towards strong typing.

Duck Typing vs. Type Erasure

2014-04-01T21:07:31Z

Consider the following C++ class.

#include 

template <typename T>
struct Caller {
  const T callee_;
  Caller(const T callee) : callee_(callee) {}
  void go() { callee_.call(); }
};

Caller can be parameterized to any type so long as it has a call() method. For example, introduce two types, Foo and Bar.

struct Foo {
  void call() const { std::cout << "Foo"; }
};

struct Bar {
  void call() const { std::cout << "Bar"; }
};

int main() {
  Caller<Foo> foo{Foo()};
  Caller<Bar> bar{Bar()};
  foo.go();
  bar.go();
  std::cout << std::endl;
  return 0;
}

This code compiles cleanly and, when run, emits “FooBar”. This is an example of duck typing — i.e., “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.” Foo and Bar are unrelated types. They have no common inheritance, but by providing the expected interface, they both work with with Caller. This is a special case of polymorphism.

Duck typing is normally only found in dynamically typed languages. Thanks to templates, a statically, strongly typed language like C++ can have duck typing without sacrificing any type safety.

Java Duck Typing

Let’s try the same thing in Java using generics.

class Caller<T> {
    final T callee;
    Caller(T callee) {
        this.callee = callee;
    }
    public void go() {
        callee.call();  // compiler error: cannot find symbol call
    }
}

class Foo {
    public void call() { System.out.print("Foo"); }
}

class Bar {
    public void call() { System.out.print("Bar"); }
}

public class Main {
    public static void main(String args[]) {
        Caller<Foo> f = new Caller<>(new Foo());
        Caller<Bar> b = new Caller<>(new Bar());
        f.go();
        b.go();
        System.out.println();
    }
}

The program is practically identical, but this will fail with a compile-time error. This is the result of type erasure. Unlike C++’s templates, there will only ever be one compiled version of Caller, and T will become Object. Since Object has no call() method, compilation fails. The generic type is only for enabling additional compiler checks later on.

C++ templates behave like a macros, expanded by the compiler once for each different type of applied parameter. The call symbol is looked up later, after the type has been fully realized, not when the template is defined.

To fix this, Foo and Bar need a common ancestry. Let’s make this Callee.

interface Callee {
    void call();
}

Caller needs to be redefined such that T is a subclass of Callee.

class Caller<T extends Callee> {
    // ...
}

This now compiles cleanly because call() will be found in Callee. Finally, implement Callee.

class Foo implements Callee {
    // ...
}

class Bar implements Callee {
    // ...
}

This is no longer duck typing, just plain old polymorphism. Type erasure prohibits duck typing in Java (outside of dirty reflection hacks).

Signals and Slots and Events! Oh My!

Duck typing is useful for implementing the observer pattern without as much boilerplate. A class can participate in the observer pattern without inheriting from some specialized class or interface. For example, see the various signal and slots systems for C++. In constrast, Java has an EventListener type for everything:

KeyListener
MouseListener
MouseMotionListener
FocusListener
ActionListener, etc.

A class concerned with many different kinds of events, such as an event logger, would need to inherit a large number of interfaces.

The Julia Programming Language

2014-03-06T23:55:44Z

Update 2020: This is an old, outdated review. With the benefit of more experience, I no longer agree with my criticsms in this article.

Julia is a new programming language primarily intended for scientific computing. It’s attempting to take on roles that are currently occupied by Matlab, its clones, and R. “Matlab done right” could very well be its tag-line, but it’s more than that. It has a beautiful type system, it’s homoiconic, and its generic function support would make a Lisp developer jealous. It still has a long ways to go, but, except for some unfortunate issues, it’s off to a great start.

Speaking strictly in terms of the language, doing better than Matlab isn’t really a significant feat. Among major programming languages, Matlab’s awfulness and bad design is second only to PHP. Octave fixes a lot of the Matlab language, but it can only go so far.

For both Matlab and R, the real strength is the enormous library of toolboxes and functionality available to help solve seemingly any scientific computing task. Plus the mindshare and the community. Julia has none of this yet. The language is mostly complete, but it will take years to build up its own package library to similar standards.

If you’re curious about learning more, the Julia manual covers the entire language as it currently exists. Unfortunately anything outside the language proper and its standard library is under-documented at this time.

A Beautiful Type System

One of the first things you’ll be told is that Julia is dynamically typed. That is, statically typed (C++, Java, Haskell) versus dynamically typed (Lisp, Python, JavaScript). However, Julia has the rather unique property that it straddles between these, and it could be argued to belong to one or the other.

The defining characteristic of static typing is that bindings (i.e. variables) have types. In dynamic typing, only values and objects have types. In Julia, all bindings have a type, making it like a statically typed language. If no type is explicitly declared, that type is Any, an abstract supertype of all types. This comes into play with generic functions.

Both abstract and concrete types can be parameterized by other types, and certain values. The :: syntax it used to declare a type.

type Point {T}
  x::T
  y::T
end

This creates a Point constructor function. When calling the constructor, the parameter type can be implicit, derived from the type of its arguments, or explicit. Because both x and y have the same type, so must the constructor’s arguments.

# Implicit type:
Point(1, -1)
# => Point{Int64}(1,-1)

# Explicit type:
Point{Float64}(1.1, -1.0)
# => Point{Float64}(1.1,-1.0)

Point(1, 1.0)
# ERROR: no method Point{T}(Int64,Float64)

The type can be constrained using <:. If Point is declared like the following it is restricted to real numbers. This is just like Java’s Point.

type Point {T <: Real}
  x::T
  y::T
end

Unlike most languages, arrays aren’t built directly into the language. They’re implemented almost entirely in Julia itself using this type system. The special part is that they get literal syntax.

[1, 2, 3]
# => Array{Int64,1}

[1.0 2.0; 3.0 4.0]
# => Array{Float64,2}

Each Array is parameterized by the type of value it holds and by an integer, indicating its rank.

The Billion Dollar Mistake

Julia has avoided what some call The Billion Dollar Mistake: null references. In languages such as Java, null is allowed in place of any object of any type. This allowance has lead to many run-time bugs that, if null didn’t exist, would have been caught at compile time.

Julia has no null and so there’s no way to make this mistake, though some kinds of APIs are harder to express without it.

Generic Functions

All of Julia’s functions are generic, including that Point constructor above. Different methods can be defined for the same function name, but for different types. In Common Lisp and Clojure, generic functions are an opt-in feature, so most functions are not generic.

Note that this is significantly different than function overloading, where the specific function to call is determined at compile time. In multimethods, the method chosen is determined by the run-time type of its arguments. One of Julia’s notable achievements is that its multimethods have very high performance. There’s usually more of a trade-off.

Julia’s operators are functions with special syntax. For example, the + function,

+(3, 4)
# => 7

A big advantage is that operators can be passed around as first-class values.

map(-, [1, 2, 3])
# [-1, -2, -3]

Because all functions are generic, operators can have methods defined for specific types, effectively becoming operator overloading (but better!).

function +(p1::Point, p2::Point)
  return Point(p1.x + p1.y, p2.x + p2.y)
end

Point(1,1) + Point(1, 2)
# => Point{Int64}(2,3)

(Note that to write this method correctly, either Point or the method should probably promote its arguments.)

Foreign Function Interface

Julia has a really slick foreign function interface (FFI). Libraries don’t need to be explicitly loaded and call interfaces don’t have to be declared ahead of time. That’s all taken care of automatically.

I’m not going to dive into the details, but basically all you have to do is indicate the library, the function, the return type, and then pass the arguments.

ccall((:clock, "libc"), Int32, ())
# => 2292761

Generally this would be wrapped up nicely in a regular function and the caller would have no idea an FFI is being used. Unfortunately structs aren’t yet supported.

Julia’s Problems

Not everything is elegant, though. There are some strange design decisions. The two big ones for me are strings and modules.

Confused Strings

Julia has a Char type that represents a Unicode code point. It’s a 32-bit value. So far so good. However, a String is not a sequence of these. A Julia string is a byte-array of UTF-8 encoded characters.

Indexing into a string operates on bytes rather than characters. Attempting to index into the middle of a character results in an error. Yuck!

"naïvety"[4]
# ERROR: invalid UTF-8 character index

I don’t understand why this behavior was chosen. This would make sense if Julia was an old language and this was designed before Unicode was established (e.g. C). But, no, this is a brand new language. There’s no excuse not to get this right the first time. I suspect it has to do with Julia’s FFI.

Clunky, Closed Modules

Julia’s module system looks like it was taken right out of Scheme’s R6RS. This isn’t a good thing.

The module definition that wraps the entire module up in a single syntactic unit. Here’s an example from the documentation. According to the style guide, the body of the module is not indented out.

module MyModule
using Lib
export MyType, foo

type MyType
  x
end

bar(x) = 2x
foo(a::MyType) = bar(a.x) + 1

import Base.show
show(io, a::MyType) = print(io, "MyType $(a.x)")
end

That final end seals the module for good. There’s no opening the module back up to define or redefine new functions or types. If you want to change something you have to reload the entire module, which will obsolete any type instances.

Compare this to Clojure, where the module isn’t wrapped up in a syntactical construct.

(ns my.module
  (require : [clojure.set :refer [rename-keys]]))

Common Lisp’s defpackage also works like this. At any time you can jump into a namespace and make new definitions.

(in-ns 'my.module)

This is absolutely essential to interactive development. The lack of this makes Julia far less dynamic than it should be. Combined with the lack of a printer, Julia is not currently suitable as an interactive interpreter subprocess (Slime, Cider, Skewer, etc.).

This is a real shame, because I’d like to start playing around with Julia, but right now it feels like a chore. It’s needlessly restricted to a C++/Java style workflow.

I’ll probably revisit Julia once it’s had a few more years to mature. Then we’ll see if things have improved enough for real use.

Emacs Byte-code Internals

2014-01-04T05:07:26Z

Byte-code compilation is an underdocumented — and in the case of the recent lexical binding updates, undocumented — part of Emacs. Most users know that Elisp is usually compiled into a byte-code saved to .elc files, and that byte-code loads and runs faster than uncompiled Elisp. That’s all users really need to know, and the GNU Emacs Lisp Reference Manual specifically discourages poking around too much.

People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity.

Screw that! What if I want to handcraft some byte-code myself? :-) The purpose of this article is to introduce the internals of Elisp byte-code interpreter. I will explain how it works, why lexically scoped code is faster, and demonstrate writing some byte-code by hand.

The Humble Stack Machine

The byte-code interpreter is a simple stack machine. The stack holds arbitrary lisp objects. The interpreter is backwards compatible but not forwards compatible (old versions can’t run new byte-code). Each instruction is between 1 and 3 bytes. The first byte is the opcode and the second and third bytes are either a single operand or a single intermediate value. Some operands are packed into the opcode byte.

As of this writing (Emacs 24.3) there are 142 opcodes, 6 of which have been declared obsolete. Most opcodes refer to commonly used built-in functions for fast access. (Looking at the selection, Elisp really is geared towards text!) Considering packed operands, there are up to 27 potential opcodes unused, reserved for the future.

opcodes 48 - 55
opcode 97
opcode 128
opcodes 169 - 174
opcodes 180 - 181
opcodes 183 - 191

The easiest place to access the opcode listing is in bytecomp.el. Beware that some of the opcode comments are currently out of date.

Segmentation Fault Warning

Byte-code does not offer the same safety as normal Elisp. Bad byte-code can, and will, cause Emacs to crash. You can try out for yourself right now,

emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'

Or evaluate the code manually in a buffer (save everything first!),

(#[0 "\300\207" [] 0])

This segfault, caused by referencing beyond the end of the constants vector, is not an Emacs bug. Doing a boundary test would slow down the byte-code interpreter. Not performing this test at run-time is a practical engineering decision. The Emacs developers have instead chosen to rely on valid byte-code output from the compiler, making a disclaimer to anyone wanting to write their own byte-code,

You should not try to come up with the elements for a byte-code function yourself, because if they are inconsistent, Emacs may crash when you call the function. Always leave it to the byte compiler to create these objects; it makes the elements consistent (we hope).

You’ve been warned. Now it’s time to start playing with firecrackers.

The Byte-code Object

A byte-code object is functionally equivalent to a normal Elisp vector except that it can be evaluated as a function. Elements are accessed in constant time, the syntax is similar to vector syntax ([...] vs. #[...]), and it can be of any length, though valid functions must have at least 4 elements.

There are two ways to create a byte-code object: using a byte-code object literal or with make-byte-code. Like vector literals, byte-code literals don’t need to be quoted.

(make-byte-code 0 "" [] 0)
;; => #[0 "" [] 0]

#[1 2 3 4]
;; => #[1 2 3 4]

(#[0 "" [] 0])
;; error: Invalid byte opcode

The elements of an object literal are:

Function parameter (lambda) list
Unibyte string of byte-code
Constants vector
Maximum stack usage
Docstring (optional, nil for none)
Interactive specification (optional)

Parameter List

The parameter list takes on two different forms depending on if the function is lexically or dynamically scoped. If the function is dynamically scoped, the argument list is exactly what appears in lisp code.

(byte-compile (lambda (a b &optional c)))
;; => #[(a b &optional c) "\300\207" [nil] 1]

There’s really no shorter way to represent the parameter list because preserving the argument names is critical. Remember that, in dynamic scope, while the function body is being evaluated these variables are globally bound (eww!) to the function’s arguments.

When the function is lexically scoped, the parameter list is packed into an Elisp integer, indicating the counts of the different kinds of parameters: required, &optional, and &rest.

The least significant 7 bits indicate the number of required arguments. Notice that this limits compiled, lexically-scoped functions to 127 required arguments. The 8th bit is the number of &rest arguments (up to 1). The remaining bits indicate the total number of optional and required arguments (not counting &rest). It’s really easy to parse these in your head when viewed as hexadecimal because each portion almost always fits inside its own “digit.”

(byte-compile-make-args-desc '())
;; => #x000  (0 args, 0 rest, 0 required)

(byte-compile-make-args-desc '(a b))
;; => #x202  (2 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c))
;; => #x302  (3 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c &rest d))
;; => #x382  (3 args, 1 rest, 2 required)

The names of the arguments don’t matter in lexical scope: they’re purely positional. This tighter argument specification is one of the reasons lexical scope is faster: the byte-code interpreter doesn’t need to parse the entire lambda list and assign all of the variables on each function invocation.

Unibyte String Byte-code

The second element is a unibyte string — it strictly holds octets and is not to be interpreted as any sort of Unicode encoding. These strings should be created with unibyte-string because string may return a multibyte string. To disambiguate the string type to the lisp reader when higher values are present (> 127), the strings are printed in an escaped octal notation, keeping the string literal inside the ASCII character set.

(unibyte-string 100 200 250)
;; => "d\310\372"

It’s unusual to see a byte-code string that doesn’t end with 135 (#o207, byte-return). Perhaps this should have been implicit? I’ll talk more about the byte-code below.

Constants Vector

The byte-code has very limited operands. Most operands are only a few bits, some fill an entire byte, and occasionally two bytes. The meat of the function that holds all the constants, function symbols, and variables symbols is the constants vector. It’s a normal Elisp vector and can be created with vector or a vector literal. Operands reference either this vector or they index into the stack itself.

(byte-compile (lambda (a b) (my-func b a)))
;; => #[(a b) "\302\134\011\042\207" [b a my-func] 3]

Note that the constants vector lists the variable symbols as well as the external function symbol. If this was a lexically scoped function the constants vector wouldn’t have the variables listed, being only [my-func].

Maximum Stack Usage

This is the maximum stack space used by this byte-code. This value can be derived from the byte-code itself, but it’s pre-computed so that the byte-code interpreter can quickly check for stack overflow. Under-reporting this value is probably another way to crash Emacs.

Docstring

The simplest component and completely optional. It’s either the docstring itself, or if the docstring is especially large it’s a cons cell indicating a compiled .elc and a position for lazy access. Only one position, the start, is needed because the lisp reader is used to load it and it knows how to recognize the end.

Interactive Specification

If this element is present and non-nil then the function is an interactive function. It holds the exactly contents of interactive in the uncompiled function definition.

(byte-compile (lambda (n) (interactive "nNumber: ") n))
;; => #[(n) "\010\207" [n] 1 nil "nNumber: "]

(byte-compile (lambda (n) (interactive (list (read))) n))
;; => #[(n) "\010\207" [n] 1 nil (list (read))]

The interactive expression is always interpreted, never byte-compiled. This is usually fine because, by definition, this code is going to be waiting on user input. However, it slows down keyboard macro playback.

Opcodes

The bulk of the established opcode bytes is for variable, stack, and constant access opcodes, most of which use packed operands.

0 - 7 : (stack-ref) stack reference
8 - 15 : (varref) variable reference (from constants vector)
16 - 23 : (varset) variable set (from constants vector)
24 - 31 : (varbind) variable binding (from constants vector)
32 - 39 : (call) function call (immediate = number of arguments)
40 - 47 : (unbind) variable unbinding (from constants vector)
129, 192-255 : (constant) direct constants vector access

Except for the last item, each kind of instruction comes in sets of 8. The nth such instruction means access the nth thing. For example, the instruction “2” copies the third stack item to the top of the stack. An instruction of “9” pushes onto the stack the value of the variable named by the second element listed in the constants vector.

However, the 7th and 8th such instructions in each set take an operand byte or two. The 7th instruction takes a 1-byte operand and the 8th takes a 2-byte operand. A 2-byte operand is written in little-endian byte-order regardless of the host platform.

For example, let’s manually craft an instruction that returns the value of the global variable foo. Each opcode has a named constant of byte-X so we don’t have to worry about their actual byte-code number.

(require 'bytecomp)  ; named opcodes

(defvar foo "hello")

(defalias 'get-foo
  (make-byte-code
    #x000                 ; no arguments
    (unibyte-string
      (+ 0 byte-varref)   ; ref variable under first constant
      byte-return)        ; pop and return
    [foo]                 ; constants
    1))                   ; only using 1 stack space

(get-foo)
;; => "hello"

Ta-da! That’s a handcrafted byte-code function. I left a “+ 0” in there so that I can change the offset. This function has the exact same behavior, it’s just less optimal,

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 3 byte-varref)     ; 4th form of varref
      byte-return)
    [nil nil nil foo]
    1))

If foo was the 10th constant, we would need to use the 1-byte operand version. Again, the same behavior, just less optimal.

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 6 byte-varref)     ; 7th form of varref
      9                     ; operand, (constant index 9)
      byte-return)
    [nil nil nil nil nil nil nil nil nil foo]
    1))

Dynamically-scoped code makes heavy use of varref but lexically-scoped code rarely uses it (global variables only), instead relying heavily on stack-ref, which is faster. This is where the different calling conventions come into play.

Calling Convention

Each kind of scope gets its own calling convention. Here we finally get to glimpse some of the really great work by Stefan Monnier updating the compiler for lexical scope.

Dynamic Scope Calling Convention

Remembering back to the parameter list element of the byte-code object, dynamically scoped functions keep track of all its argument names. Before executing a function the interpreter examines the lambda list and binds (varbind) every variable globally to an argument.

If the caller was byte-compiled, each argument started on the stack, was popped and bound to a variable, and, to be accessed by the function, will be pushed back right onto the stack (varref). There’s a lot of argument indirection for each function call.

Lexical Scope Calling Convention

With lexical scope, the argument names are not actually bound for the evaluation byte-code. The names are completely gone because the compiler has converted local variables into stack offsets.

When calling a lexically-scoped function, the byte-code interpreter examines the integer parameter descriptor. It checks to make sure the appropriate number of arguments have been provided, and for each unprovided &optional argument it pushes a nil onto the stack. If the function has a &rest parameter, any extra arguments are popped off into a list and that list is pushed onto the stack.

From here the function can access its arguments directly on the stack without any named variable misdirection. It can even consume them directly.

;; -*- lexical-binding: t -*-
(defun foo (x) x)

(symbol-function #'foo)
;; => #[#x101 "\207" [] 2]

The byte-code for foo is a single instruction: return. The function’s argument is already on the stack so it doesn’t have to do anything. Strangely the maximum stack usage element is wrong here (2), but it won’t cause a crash.

;; (As of this writing `byte-compile' always uses dynamic scope.)

(byte-compile 'foo)
;; => #[(x) "\010\207" [x] 1]

It takes longer to set up (x is implicitly bound), it has to make an explicit variable dereference (varref), then it has to clean up by unbinding x (implicit unbind). It’s no wonder lexical scope is faster!

Note that there’s also a disassemble function for examining byte-code, but it only reveals part of the story.

(disassemble #'foo)
;; byte code:
;;   args: (x)
;; 0       varref    x
;; 1       return

Compiler Intermediate “lapcode”

The Elisp byte-compiler has an intermediate language called lapcode (“Lisp Assembly Program”), which is much easier to optimize than byte-code. It’s basically an assembly language built out of s-expressions. Opcodes are referenced by name and operands, including packed operands, are handled whole. Each instruction is a cons cell, (opcode . operand), and a program is a list of these.

Let’s rewrite our last get-foo using lapcode.

(defalias 'get-foo
  (make-byte-code
    #x000
    (byte-compile-lapcode
      '((byte-varref . 9)
        (byte-return)))
    [nil nil nil nil nil nil nil nil nil foo]
    1))

We didn’t have to worry about which form of varref we were using or even how to encode a 2-byte operand. The lapcode “assembler” took care of that detail.

Project Ideas?

The Emacs byte-code compiler and interpreter are fascinating. Having spent time studying them I’m really tempted to build a project on top of it all. Perhaps implementing a programming language that targets the byte-code interpreter, improving compiler optimization, or, for a really big project, JIT compiling Emacs byte-code.

People can write byte-code!

JavaScript Function Statements vs. Expressions

2013-05-14T00:00:00Z

The JavaScript function keyword has two meanings depending on how it’s used: as a statement or as an expression. It’s a statement when the keyword appears at the top-level of a block. This is known as a function declaration.

function foo() {
    // ...
}

This statement means declare a variable called foo in the current scope, create a closure named foo, and assign this closure to this variable. Also, this assignment is “lifted” such that it happens before any part of the body of the surrounding function is evaluated, including before any variable assignments.

Notice that the closure’s name is separate from the variable name. Except for a certain well-known JavaScript engine, closure/function objects have a read-only name property.

foo.name; // => "foo"

A name is required for function declarations, otherwise they would be no-ops. This name also appears in debugging backtraces.

A function’s name has different semantics in function expressions. The function keyword is an expression when used in an expression position of a statement.

var foo = function() {
    // ...
}

The function expression above evaluates to an anonymous closure, which is then assigned to the variable foo. This is nearly identical to the previous function declaration except for two details.

Explicit variable assignments are never lifted, unlike function declarations. This assignment will happen exactly where it appears in the code.
The resulting closure is anonymous. The name property, if available, will be an empty string. Furthermore, the lack of name affects the scope of the function. I’ll get back to that point in a moment.

IIFEs

An immediately-invoked function expression (IIFE), used to establish a one-off local scope, is typically wrapped in parenthesis. The purpose of the parenthesis is to put function in an expression position so that it is a function expression rather than a function declaration.

(function() {
    // ... declare variables, etc.
}());

Another way to put function in an expression position is to precede it with an unary operator. This is an example of being clever instead of practical.

!function() {
    // ... declare variables, etc.
}();

If function is already in an expression position, the wrapping parenthesis are unnecessary. For example,

var foo = function() { return "bar"; }();
foo; // => "bar"

However, it may still be a good idea to wrap the IIFE in parenthesis just to help other programmers read your code. A casual glance that doesn’t notice the function invocation would assume a function is being assigned to foo. Wrapping a function expression with parenthesis is a well-known idiom for IIFEs.

Function Name and Scope

What happens when a function expression is given a name? Two things.

The name will appear in the name property of the closure (if available). Also, the name will also show up in backtraces. This makes naming closures a handy debugging technique.
The name becomes a variable in the scope of the function. This means it’s possible to write recursive function expressions!

function maths() {
    return {
        // ...
        fact: function fact(n) {
            return n === 0 ? 1 : n * fact(n - 1);
        }
    };
}

maths().fact(10); // => 3628800

The fact function is evaluated as a function expression as part of this object literal. The variable fact is established in the scope of the function fact, assigned to the function itself, allowing the function to call itself. It’s a self-contained recursive function.

Pop Quiz: Function Name and Scope

Given this, try to determine the answer to this problem in your head. What does the second invocation of foo evaluate to?

function foo() {
    foo = function() {
        return "function two";
    };
    return "function one";
}

foo(); // => "function one"
foo(); // => ???

Here’s where we come to the major difference between function declarations and function expressions. The answer is "function two". Even though functions declarations create named functions, these functions do not have the implicit self-named variable in its scope. Unless this variable is declared explicitly, the name will refer to a variable in a containing scope.

This has the useful property that a function can re-define itself and be correctly named at the same time. If the function needs to perform expensive first-time initialization, such reassignment can be used to do it lazily without exposing any state and without requiring an is-initialized check on each invocation. For example, this trick is exactly how Emacs autoloading works.

If this function declaration is converted to what appears to be the equivalent function expression form the difference is obvious.

var foo = function foo() {
    foo = function() {
        return "function two";
    };
    return "function one";
};

foo(); // => "function one"
foo(); // => "function one"

The reassignment happens in the function’s scope, leaving the outer scope’s assignment intact. For better or worse, even ignoring assignment lifting, there’s no way to perfectly emulate function declaration using a function expression.

JavaScript Truthiness Quiz

2012-11-28T00:00:00Z

I’ve got another quirky JavaScript quiz for you. This one has two different answers.

function foo(object) {
    object.bar = false;
    return object.bar && true;
}

foo(___); // Fill in an argument such that foo() returns true.

Obviously a normal object won’t do the job. Something more special is needed.

foo({bar: true});  // => false

The fact that foo() can return true could introduce a bug during refactoring: the code initially appears to be a tautology that could be reduced to a simpler return false. Since this quiz has solutions that’s obviously not true.

Had I reversed the booleans — assign bar to true and make this function return a falsy value — then almost any immutable object, such as a string, would do.

function foo2(object) {
    object.bar = true;  // inverse
    return object.bar && true;
}

foo2("baz");  // => undefined

The bar assignment would fail and attempting to access it would return undefined, which is falsy.

Answer

The two approaches are getters and property descriptors.

Getters

JavaScript directly supports getter and setter properties. Without special language support, such accessors could be accomplished with plain methods (like Java).

var lovesBlue = {
    _color: "blue",  // private

    getColor: function() {
        return this._color;
    },

    /* Only blue colors are allowed! */
    setColor: function(color) {
        if (/blue/.test(color)) {
            this._color = color;
        }
        return this._color;
    }
};

lovesBlue.getColor();              // => "blue"
lovesBlue.setColor("red");         // => "blue" (set fails)
lovesBlue.setColor("light blue");  // => "light blue"

JavaScript allows properties themselves to transparently run methods, such as to enforce invariants, even though it’s not an obvious call site. This is how many of the browser environment objects work. There’s a special syntax with get and set keywords. (Keep this mind, JSON parser writers!)

var lovesBlue = {
    _color: "blue",

    get color() {
        return this._color;
    },

    set color(color) {
        if (/blue/.test(color)) {
            this._color = color;
        }
    }
};

lovesBlue.color = "red";  // => "red", but assignment fails
lovesBlue.color;          // => "blue"

This can be used to solve the quiz,

foo({get bar() { return true; }});

Because bar is a getter with no associated setter, there’s effectively no assigning values to bar and it always evaluates to true.

Property descriptors

Object properties themselves have properties, called descriptors, governing their behavior. The accessors above are examples of the descriptors get and set. For our situation there’s a writable descriptor which determines whether or not a particular property can be assigned. If you really wanted to lock this in, there’s even a metadescriptor (a metametaproperty?), configurable, that determines whether or not a property’s descriptors, including itself, can be modified.

There’s no literal syntax for them, but these descriptors can be set with Object.defineProperty(). Conveniently, this function returns the object being modified.

foo(Object.defineProperty({}, 'bar', {value: true, writable: false}));

This creates a new object, sets bar to true, and locks that in by making the property read-only.

The fact that attempting to assign a read-only property silently fails instead of throwing an exception is probably another mistake in the language’s design. While this behavior newbie-friendly, it allows bugs to slip by undetected, only to be found much later when they’re more expensive to address. It makes JavaScript programs more brittle.

Existing objects: a third approach?

If you’re lazy and in a browser environment, you don’t even need to construct new objects to solve the problem. There are some already lying around! My favorite is HTML5’s localStorage. It stringifies all property assignments. This means that false becomes "false", which is truthy.

foo(localStorage);  // => true

This is arguably a third approach because the stringification behavior can’t be accomplished with either normal accessors or descriptors alone.

Raising the Dead with JavaScript

2012-11-20T00:00:00Z

After my last post, Gavin sent me this: Scope Cheatsheet. Besides its misleading wording, an interesting fact stood out and gave me another JavaScript challenge question. I’ll also show you how it allows JavaScript to raise the dead!

Background

Like its close cousin, Scheme, JavaScript is a Lisp-1: functions and variables share the same namespace. In Scheme, the define form defines new variables.

(define foo "Hello, world!")

Combine with the lambda form and it can be used to name functions,

(define square (lambda (x) (* x x)))

(square -4)  ;; => 16

The variable square is assigned to an anonymous function, and afterward it can be called as a function. Since this is so common, there’s a syntactic shorthand (sugar) for this,

(define (square x) (* x x))

Notice that the first argument to define is now a list rather than a symbol. This is a signal to define that a function is being defined, and that this should be expanded into the lambda example above. (Also note that the declaration mimics a function call, which is pretty neat.)

JavaScript also has syntactic sugar for the same purpose. The var statement establishes a binding in the current scope. This can be used to define both variables and functions, since they share a namespace. For convenience, in addition to defining an anonymous function, the function statement can be used to declare a variable and assign it a function. These definitions below are equivalent … most of the time.

var square = function(x) {
    return x * x;
}

function square(x) {
    return x * x;
}

The second definition is actually more magical than a syntactic shorthand, which leads into my quiz.

Quiz

function bar() {
    var foo = 0;
    function foo() {}
    return typeof foo;
}

bar(); // What does this return? Why?

function baz() {
    var foo;
    function foo() {}
    return typeof foo;
}

baz(); // How about now?

function quux() {
    var foo = 0;
    var foo = function () {}
    return typeof foo;
}

quux(); // How about now?

We have three functions, bar(), baz(), and quux(), each slightly different. Try to figure out the return value of each without running them in a JavaScript interpreter. Reading the cheatsheet should give you a good idea of the answer.

Answer

Figured it out? The first function, bar(), is the surprising one. If the special function form was merely syntactic sugar then all this means is that foo is redundantly declared (and re-assigned before accessing it, which the compiler could optimize). The final assignment is a function, so it should return 'function'.

However, this is not the case! This function returns 'number'. The first assignment listed in the code actually happens after the second assignment, the function definition. This is because functions defined using the special syntax are hoisted to the top of the function. The function assignments are evaluated before any other part of the function body. This is the extra magic behind the special function syntax.

The effect is more apparent when looking at the return value of quux(), which is 'function'. The special function syntax isn’t used so the assignments are performed in the order that they’re listed. This isn’t surprising, except for the fact that variables can be declared multiple times in a scope without any sort of warning.

The second function, baz(), returns 'function'. The function definition is still hoisted but the variable declaration performs no assignment. The function assignment is not overridden. Because of the lack of assignment, nothing actually happens at all for the variable declaration.

Now, this seems to be a cloudy concept for even skilled programmers: a variable declaration like var foo = 0 accomplishes two separate things. The merge of these two tasks into a single statement is merely one of convenience.

Declaration: declares a variable, modifying the semantics of the function’s body. It changes what place in memory an identifier in the current scope will refer to. This is a compile-time activity. Nothing happens at run time — there is no when. When function definitions are hoisted, it’s the assignment (part 2) that gets hoisted. In C, variables are initially assigned to stack garbage (globals are zeroed). In JavaScript, variables are initially assigned to undefined.
Assignment: binds a variable to a new value. This is evaluated at run time. It matters when this happens in relation to other evaluations.

Consider this,

var foo = foo;

The expression on the right-hand side is evaluated in the same scope as the variable declaration. foo is initially assigned to undefined, then it is re-assigned to undefined. This permits recursive functions to be defined with var — otherwise the identifier used to make the recursive call wouldn’t refer to the function itself.

var factorial = function(n) {
    if (n === 0)
        return 1;
    else
        return factorial(n - 1) * n;
};

In contrast, Lisp’s let does not evaluate the right-hand side within the scope of the let, so recursive definitions are not possible with a regular let. This is the purpose of letrec (Scheme) and labels (Common Lisp).

;; Compile error, x is unbound
(let ((x x))
  x)

Why function hoisting?

JavaScript’s original goal was to be easy for novices to program. I think that they wanted users to be able to define functions anywhere in a function (at the top level) without thinking about it. Novices generally don’t think of functions as values, so this is probably more intuitive for them. To accomplish this, the assignment needs to happen before the real body of the function. Unfortunately, this leads to surprising behavior, and, ultimately, it was probably a bad design choice.

Below, in any other language the function definition would be dead code, unreachable by any valid control flow, and the compiler would be free to toss it.

function foo() {
    return baz();
    function baz() { return 'Hello'; }
}

foo(); // => 'Hello'

But in JavaScript you can raise the dead!

JavaScript Debugging Challenge

2012-11-19T00:00:00Z

As I’ve been exploring JavaScript I’ve come up with a few interesting questions that expose some of JavaScript’s quirks. This is the first one, which I came up with over the weekend. Study it and try to come up with an answer before looking a the explanation below. Go ahead and use a JavaScript interpreter or debugger to poke at it if you need to.

var count = 4;

function foo() {
    var table = [count];

    /* Build the table. */
    while (count-- > 0) {
        table.push([]);
    }

    /* Fill it with numbers. */
    for (var count = 1; count < table.length; count++) {
        table[count].push(count);
    }
    return table;
}

foo(); // What does this return? And why?

When I originally came up with the problem, I just enabled impatient-mode in my editor buffer to share it friends. It’s a really convenient alternative to pastebins!

Answer

If you’ve gotten this far either you figured it out or you gave up (hopefully not right away!). Without careful attention, the expected output would be [4, [1], [2], [3], [4]]. Create an array containing count, push on count arrays, and finally iterate over the whole thing. Seems simple.

However, the actual return value is [undefined], which at first may seem to defy logic. There’s a bit of a double-trick to this question due to the way I wrote it.

The first trick is that this might appear to be a quirk in the Array push() method. If you pass an array to push() does it actually concatenate the array, flattening out the result? If it did, pushing an empty array would result in nothing. This is not the case, fortunately.

var foo = [1, 2, 3];
foo.push([]);  // foo = [1, 2, 3, []]

The real quirk here is JavaScript’s strange scoping rules. JavaScript only has function scope ¹, not block scope like most other languages. A loop, including for, doesn’t get its own scope so the looping variables are actually hoisted into the function scope. For the first two uses of count, it isn’t actually a free variable like it appears to be. It refers to the for loop variable count even though it’s declared later in the function.

A variable doesn’t spring into existence at the place it’s declared — otherwise that would be a sort-of hidden nested scope. The binding’s extent is determined at compile-time (i.e. lexical scope). If the variable is declared anywhere in the function with var, it is bound for the entire body of the function. In contrast, C requires that variables be declared before they are used. This isn’t strictly necessary from the compiler’s point of view, but it keeps humans from making mistakes like above. A C variable “exists” (barring optimizations, it’s been allocated space on the stack) for the entire block it’s declared in, but since it can’t be referenced before the declaration that detail has no visible effect.

In the code above, because count was not assigned any value at the beginning of the function, it is initially bound to undefined, which is coerced into 0 when used as a number. The result is that the array is initially filled with undefined, then zero arrays are pushed onto it. In the final loop, the array doesn’t have any elements to loop over so nothing happens and [undefined] is returned.

yet.

JavaScript 1.7 actually has block scope when using let, but let is not widely supported ↩

JavaScript's Quirky eval

2012-11-14T00:00:00Z

The infamous eval function is a strange beast in any language, but I think JavaScript’s is perhaps the strangest incarnation. Its very presence in a function foils any possibility of optimization, because it is capable of wreaking so much havoc.

The purpose of eval is to take an arbitrary data structure containing a program (usually a string) and evaluate it. Most of the time the use eval indicates a bad program — its use completely unnecessary, very slow, and probably dangerous. There are exceptions, like Skewer, where a REPL is being provided to a developer. Something needs to perform the “E” part of REPL.

If the language’s platform already has a parser and compiler/interpreter around, like an interpreted language, it’s most of the way to having an eval. eval just exposes the existing functionality directly to programs. In a brute-force, trivial approach, the string to be evaluated could be written to a file and loaded like a regular program.

Semantics

However, executing arbitrary code in an established context is non-trivial. When a program is compiled, the compiler maps out the program’s various lexical bindings at compile time. For example, when compiling C, a function’s variables become offsets from the stack pointer. As an optimization, unused variables can be discarded, saving precious stack space. If the code calling eval has been compiled like this and the evaluation is being done in the same lexical environment as the call, then eval needs to be able to access this mapping in order to map identifiers to bindings.

This complication can be avoided if the eval is explicitly done in the global context. For example, take Common Lisp’s eval.

Evaluates form in the current dynamic environment and the null lexical environment.

This means lexical bindings are not considered, only dynamic (global) bindings. In the expression below, foo is bound lexically so eval has no access to it. The compilation and optimization of this code is unaffected by the eval. It’s about as complicated as loading a new source file with load.

(let ((foo 'bar))
  (eval 'foo))  ; error, foo is unbound

Python and Ruby are similar, where eval is done in the global environment. In both cases, an evaluation environment can be passed explicitly as an additional argument.

In Perl things start to get a bit strange (string version). eval is done in the current lexical environment. ~~However, no assignments, either to change bindings or modify data structures, are visible outside of the eval.~~ (Fixed a string interpolation mistake.)

sub foo {
    my $bar = 10;
    eval '$bar = 5';
    return eval '$bar';
}

This function returns 5. The eval modified the lexically scoped $bar.

Note how short Lisp’s eval documentation is compared to Perl’s. Lisp’s eval semantics are dead simple — very important for such a dangerous function. Perl’s description is two orders of magnitude larger than Lisp’s and it still doesn’t fully document the feature.

JavaScript

JavaScript goes much further than all of this. Not only is eval done in the current lexical environment but it can introduce entirely new bindings!

function foo() {
    eval('var bar = 10');
    return bar;
}

This function returns 10. eval created a new lexical variable in foo at run time. Because the environment can be manipulated so drastically at run time, any hopes of effectively compiling foo are thrown out the window. To have an outside function modify the local environment is a severe side-effect. It essentially requires that JavaScript be interpreted rather than compiled. Along with the with statement, it’s strong evidence that JavaScript was at some point designed by novices.

eval also makes closures a lot heavier. Normally the compiler can determine at compile time which variables are being accessed by a function and minimize the environment captured by a closure. For example,

function foo(x) {
    var y = {x: x};
    return function() {
        return x * x;
    };
}

The function foo returns a closure capturing the bindings x and y. The compiler can prove that y is never accessed by the closure and omit it, freeing the object bound to y for garbage collection. However, if eval is present, anything could be accessed at any time and the compiler can prove nothing. For example,

function foo(x) {
    return function() {
        return eval('x * x');
    };
}

The variable x is never accessed lexically, but the eval can tease it out at run time. The expression foo(3)() will evaluate to 9, showing that anything exposed to the closure is not free to be garbage collected as long as the closure is accessible.

If that’s where the story ended, JavaScript optimization would look pretty bleak. Any function call could be a call to eval and so any time we call another function it may stomp all over the local environment, preventing the compiler from proving anything useful. For example,

var secretEval = eval;
function foo(string) {
    // ...
    secretEval(string);
    // ...
}

There’s good news and bad news. The good news is that this is not the case in the above example. string will be evaluated in the global environment, not the local environment. The bad news is that this is because of a obscure, complicated concept of indirect and direct evals.

In general, when eval is called by a name other than “eval” it is an indirect call and is performed in the global environment (see the linked article for a more exact description). This means the compiler can tell at compile time whether or not eval will be evaluating in the lexical environment. If not, it’s free to make optimizations that eval would otherwise prohibit. Whew!

Strict mode

To address eval’s problems a bit further, along with some other problems, ECMAScript 5 introduced strict mode. Strict mode modifies JavaScript’s semantics so that it’s a more robust and compiler-friendly language.

In strict mode, eval still uses the local environment when called directly but it gets its own nested environment. New bindings are created in this nested environment, which is discarded when evaluation is complete. JavaScript’s eval is still quirky, but less so than before.

BrianScheme

2011-01-11T00:00:00Z

Remember back a year ago I tried my hand at a Lisp implementation called Wisp? Well, currently a co-worker of mine, Brian Taylor, is similarly working on his own Scheme implementation — but he knows more about what he's doing than I did, so it's more interesting. However, that expertise doesn't extend to inventing a clever name (Zing!): it's unsubtly called BrianScheme.

git clone git://github.com/netguy204/brianscheme.git

I've been hacking at it a little myself, cheering from the sidelines.

git remote add wellons git://github.com/skeeto/brianscheme.git

Like Wisp, it's written from scratch in C from the bottom up. Unlike Wisp, it has closures, lexical scoping, mark-and-sweep garbage collection, object system, and compiles to a bytecode (in memory). Continuations are still a ways off, but planned. One of the most powerful features so far is the foreign function interface (FFI). Now that he's implemented it with libffi he's barely had to touch the C code base. In fact, thanks to the FFI, the the C portion of BrianScheme will be shrinking.

For example, BrianScheme currently lacks floating point numbers, and its integers are currently just native fixnums. Sometime soon it will, like Wisp, use the GNU Multi-Precision Library (GMP) to provide bignums. Adding this will not require making any changes whatsoever to the C code. Using the object system (Tiny-CLOS), hooks in the reader and printer, and the FFI, this can be entirely implemented in the language itself.

Just-in-time compilation (JIT) has begun to be implemented without touching C. Again, done by pulling in in libjit with the FFI.

Because I wrote Wisp to be embeddable and a library, I was able to run Wisp in BrianScheme, via the FFI, and expose some bindings. For example, I can send it s-expressions to evaluate,

> (require 'wisp)
> (wisp:eval '(expt 6 56))
37711171281396032013366321198900157303750656

BrianScheme doesn't currently support threading, mainly because the garbage collector isn't ready for it. But remember how I mentioned GNU Pth last month? Again, I was able to load Pth with the FFI to add userspace threading, which is safe for the garbage collector because it's effectively an atomic operation. (Once continuations are implemented, this could actually be implemented without Pth, just by making good use of those continuations.) The current hangup is the REPL, which doesn't know about Pth and so it never yields. To take advantage of threading you have to suspend the REPL (with pth:join).

This REPL issue should be solved with the long term goal for BrianScheme. The C component of BrianScheme will merely exist for the purposes of bootstrapping the full system. During initialization, just about everything will be redefined in BrianScheme, with the original C definitions only living long enough to load what's needed. This includes reimplementing the reader itself in BrianScheme, which enables all sorts of possibilities, like the previously mentioned bignums implemented in the language itself, inline regular expressions, and proper yielding to the userspace thread scheduler.

So go ahead and clone Brian's repository (and add mine as a remote, too! :-D) and poke around at it. To compare to Wisp again, it's not quite as stable at the moment. It exits very easily from runtime errors, due to lacking error handling, so an instance generally doesn't live very long at the moment. This will probably be resolved sometime soon. Except for that, it does play well with Emacs as an inferior-lisp.

A GNU Octave Feature

2008-08-29T00:00:00Z

At work they recently moved me to a new project. It is a Matlab-based data analysis thing. I haven't really touched Matlab in over a year (the last time I used Matlab at work), and, instead, use GNU Octave at home when the language is appropriate. I got so used to Octave that I found a pretty critical feature missing from Matlab's implementation: treat an expression as if it were of the type of its output.

Let's say we want to index into the result of a function. Take, for example, the magic square function, magic(). This spits out a magic square of the given size. In Octave we can generate a 4x4 magic square and chop out the middle 2x2 portion in one line.

octave> magic(4)(2:3,2:3)
ans =

   11   10
    7    6

Or more possibly clearly,

octave> [magic(4)](2:3,2:3)
ans =

   11   10
    7    6

Try this in Matlab and you will get a big, fat error. You have to assign the magic square to a temporary variable to do the same thing. I kept trying to do this sort of thing in Matlab and was thinking to myself, "I know I can do this somehow!". Nope, I was just used to having Octave.

Where this really shows is when you want to reshape a matrix into a nice, simple vector. If you have a matrix M and want to count the number of NaN's it has, you can't just apply the sum() function over isnan() because it only does sums of columns. You can get around this with a special index, (:).

So, to sum all elements in M directly,

octave> sum(M(:))

In Octave, to count NaN's with isnan(),

octave> sum(isnan(M)(:))

Again, Matlab won't let you index the result of isnan() directly. Stupid. I guess the Matlab way to do this is to apply sum() twice.

Every language I can think of handles this properly. C, C++, Perl, Ruby, etc. It is strange that Matlab itself doesn't have it. Score one more for Octave.

Articles tagged lang at null program

Unintuitive JSON Parsing

Concatenated JSON

No, PHP Doesn't Have Closures

Anonymous functions and closures

References

Emacs Lisp partial function application

Monkey see, monkey do

UTF-8 String Indexing Strategies

Emacs Lisp

Julia

Go

Preferences

An Async / Await Library for Emacs Lisp

aio example

Promises, simplified

Evaluate in the context of a promise

Async functions

Composing promises

Threads

Processes

Testing aio

Async/await is pretty awesome

Python Decorators: Syntactic Artificial Sweetener

Syntactic “sugar”

Syntactic artificial sweetener

Pattern matching

The CPython Bytecode Compiler is Dumb

Disassembly examples

Local variable elimination

Constant folding

Allocation optimization

Don’t expect too much

A JavaScript Typed Array Gotcha

JavaScript specification

C specification

Specifications are useful

Emacs 26 Brings Generators and Threads

Generators

Threads

Building generators on threads

The future of threads

Emacs Lisp Lambda Expressions Are Not Self-Evaluating

Taming an old dragon

eval-after-load

A workaround

Evaluating function objects

Solving the problem with one character

What's in an Emacs Lambda

Lambda under byte compilation

Lambda in lexical scope

Overcapturing

How byte compiled closures are constructed

The Adversarial Implementation

C example

Python example

A tool for understanding specifications

The Vulgarness of Abbreviated Function Templates

C++11 Type Inference

Abbreviated Function Templates

Makefile Assignments are Turing-Complete

A POSIX workaround

Macro Operations

Branching

What about loops?

Game of Life

Per Loop vs. Per Iteration Bindings

Backup to C

C99 Loops

JavaScript’s Let

The Closure Trap

The JavaScript Workaround

Conclusion

Three Dimensions of Type Systems

Static vs Dynamic

Lexical vs. Dynamic Scope

Strong vs. Weak

Further Reading

Duck Typing vs. Type Erasure

Java Duck Typing