Articles tagged lisp at null program

An Async / Await Library for Emacs Lisp

2019-03-10T20:57:03Z

As part of building my Python proficiency, I’ve learned how to use asyncio. This new language feature first appeared in Python 3.5 (PEP 492, September 2015). JavaScript grew a nearly identical feature in ES2017 (June 2017). An async function can pause to await on an asynchronously computed result, much like a generator pausing when it yields a value.

In fact, both Python and JavaScript async functions are essentially just fancy generator functions with some specialized syntax and semantics. That is, they’re stackless coroutines. Both languages already had generators, so their generator-like async functions are a natural extension that — unlike stackful coroutines — do not require significant, new runtime plumbing.

Emacs officially got generators in 25.1 (September 2016), though, unlike Python and JavaScript, it didn’t require any additional support from the compiler or runtime. It’s implemented entirely using Lisp macros. In other words, it’s just another library, not a core language feature. In theory, the generator library could be easily backported to the first Emacs release to properly support lexical closures, Emacs 24.1 (June 2012).

For the same reason, stackless async/await coroutines can also be implemented as a library. So that’s what I did, letting Emacs’ generator library do most of the heavy lifting. The package is called aio:

https://github.com/skeeto/emacs-aio

It’s modeled more closely on JavaScript’s async functions than Python’s asyncio, with the core representation being promises rather than a coroutine objects. I just have an easier time reasoning about promises than coroutines.

I’m definitely not the first person to realize this was possible, and was beaten to the punch by two years. Wanting to avoid fragmentation, I set aside all formality in my first iteration on the idea, not even bothering with namespacing my identifiers. It was to be only an educational exercise. However, I got quite attached to my little toy. Once I got my head wrapped around the problem, everything just sort of clicked into place so nicely.

In this article I will show step-by-step one way to build async/await on top of generators, laying out one concept at a time and then building upon each. But first, some examples to illustrate the desired final result.

aio example

Ignoring all its problems for a moment, suppose you want to use url-retrieve to fetch some content from a URL and return it. To keep this simple, I’m going to omit error handling. Also assume that lexical-binding is t for all examples. Besides, lexical scope required by the generator library, and therefore also required by aio.

The most naive approach is to fetch the content synchronously:

(defun fetch-fortune-1 (url)
  (let ((buffer (url-retrieve-synchronously url)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

The result is returned directly, and errors are communicated by an error signal (e.g. Emacs’ version of exceptions). This is convenient, but the function will block the main thread, locking up Emacs until the result has arrived. This is obviously very undesirable, so, in practice, everyone nearly always uses the asynchronous version:

(defun fetch-fortune-2 (url callback)
  (url-retrieve url (lambda (_status)
                      (funcall callback (buffer-string)))))

The main thread no longer blocks, but it’s a whole lot less convenient. The result isn’t returned to the caller, and instead the caller supplies a callback function. The result, whether success or failure, will be delivered via callback, so the caller must split itself into two pieces: the part before the callback and the callback itself. Errors cannot be delivered using a error signal because of the inverted flow control.

The situation gets worse if, say, you need to fetch results from two different URLs. You either fetch results one at a time (inefficient), or you manage two different callbacks that could be invoked in any order, and therefore have to coordinate.

Wouldn’t it be nice for the function to work like the first example, but be asynchronous like the second example? Enter async/await:

(aio-defun fetch-fortune-3 (url)
  (let ((buffer (aio-await (aio-url-retrieve url))))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

A function defined with aio-defun is just like defun except that it can use aio-await to pause and wait on any other function defined with aio-defun — or, more specifically, any function that returns a promise. Borrowing Python parlance: Returning a promise makes a function awaitable. If there’s an error, it’s delivered as a error signal from aio-url-retrieve, just like the first example. When called, this function returns immediately with a promise object that represents a future result. The caller might look like this:

(defcustom fortune-url ...)

(aio-defun display-fortune ()
  (interactive)
  (message "%s" (aio-await (fetch-fortune-3 fortune-url))))

How wonderfully clean that looks! And, yes, it even works with interactive like that. I can M-x display-fortune and a fortune is printed in the minibuffer as soon as the result arrives from the server. In the meantime Emacs doesn’t block and I can continue my work.

You can’t do anything you couldn’t already do before. It’s just a nicer way to organize the same callbacks: implicit rather than explicit.

Promises, simplified

The core object at play is the promise. Promises are already a rather simple concept, but aio promises have been distilled to their essence, as they’re only needed for this singular purpose. More on this later.

As I said, a promise represents a future result. In practical terms, a promise is just an object to which one can subscribe with a callback. When the result is ready, the callbacks are invoked. Another way to put it is that promises reify the concept of callbacks. A callback is no longer just the idea of extra argument on a function. It’s a first-class thing that itself can be passed around as a value.

Promises have two slots: the final promise result and a list of subscribers. A nil result means the result hasn’t been computed yet. It’s so simple I’m not even bothering with cl-struct.

(defun aio-promise ()
  "Create a new promise object."
  (record 'aio-promise nil ()))

(defsubst aio-promise-p (object)
  (and (eq 'aio-promise (type-of object))
       (= 3 (length object))))

(defsubst aio-result (promise)
  (aref promise 1))

To subscribe to a promise, use aio-listen:

(defun aio-listen (promise callback)
  (let ((result (aio-result promise)))
    (if result
        (run-at-time 0 nil callback result)
      (push callback (aref promise 2)))))

If the result isn’t ready yet, add the callback to the list of subscribers. If the result is ready call the callback in the next event loop turn using run-at-time. This is important because it keeps all the asynchronous components isolated from one another. They won’t see each others’ frames on the call stack, nor frames from aio. This is so important that the Promises/A+ specification is explicit about it.

The other half of the equation is resolving a promise, which is done with aio-resolve. Unlike other promises, aio promises don’t care whether the promise is being fulfilled (success) or rejected (error). Instead a promise is resolved using a value function — or, usually, a value closure. Subscribers receive this value function and extract the value by invoking it with no arguments.

Why? This lets the promise’s resolver decide the semantics of the result. Instead of returning a value, this function can instead signal an error, propagating an error signal that terminated an async function. Because of this, the promise doesn’t need to know how it’s being resolved.

When a promise is resolved, subscribers are each scheduled in their own event loop turns in the same order that they subscribed. If a promise has already been resolved, nothing happens. (Thought: Perhaps this should be an error in order to catch API misuse?)

(defun aio-resolve (promise value-function)
  (unless (aio-result promise)
    (let ((callbacks (nreverse (aref promise 2))))
      (setf (aref promise 1) value-function
            (aref promise 2) ())
      (dolist (callback callbacks)
        (run-at-time 0 nil callback value-function)))))

If you’re not an async function, you might subscribe to a promise like so:

(aio-listen promise (lambda (v)
                      (message "%s" (funcall v))))

The simplest example of a non-async function that creates and delivers on a promise is a “sleep” function:

(defun aio-sleep (seconds &optional result)
  (let ((promise (aio-promise))
        (value-function (lambda () result)))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

Similarly, here’s a “timeout” promise that delivers a special timeout error signal at a given time in the future.

(defun aio-timeout (seconds)
  (let ((promise (aio-promise))
        (value-function (lambda () (signal 'aio-timeout nil))))
    (prog1 promise
      (run-at-time seconds nil
                   #'aio-resolve promise value-function))))

That’s all there is to promises.

Evaluate in the context of a promise

Before we get into pausing functions, lets deal with the slightly simpler matter of delivering their return values using a promise. What we need is a way to evaluate a “body” and capture its result in a promise. If the body exits due to a signal, we want to capture that as well.

Here’s a macro that does just this:

(defmacro aio-with-promise (promise &rest body)
  `(aio-resolve ,promise
                (condition-case error
                    (let ((result (progn ,@body)))
                      (lambda () result))
                  (error (lambda ()
                           (signal (car error) ; rethrow
                                   (cdr error)))))))

The body result is captured in a closure and delivered to the promise. If there’s an error signal, it’s “rethrown” into subscribers by the promise’s value function.

This is where Emacs Lisp has a serious weak spot. There’s not really a concept of rethrowing a signal. Unlike a language with explicit exception objects that can capture a snapshot of the backtrace, the original backtrace is completely lost where the signal is caught. There’s no way to “reattach” it to the signal when it’s rethrown. This is unfortunate because it would greatly help debugging if you got to see the full backtrace on the other side of the promise.

Async functions

So we have promises and we want to pause a function on a promise. Generators have iter-yield for pausing an iterator’s execution. To tackle this problem:

Yield the promise to pause the iterator.
Subscribe a callback on the promise that continues the generator (iter-next) with the promise’s result as the yield result.

All the hard work is done in either side of the yield, so aio-await is just a simple wrapper around iter-yield:

(defmacro aio-await (expr)
  `(funcall (iter-yield ,expr)))

Remember, that funcall is here to extract the promise value from the value function. If it signals an error, this propagates directly into the iterator just as if it had been a direct call — minus an accurate backtrace.

So aio-lambda / aio-defun needs to wrap the body in a generator (iter-lamba), invoke it to produce a generator, then drive the generator using callbacks. Here’s a simplified, unhygienic definition of aio-lambda:

(defmacro aio-lambda (arglist &rest body)
  `(lambda (&rest args)
     (let ((promise (aio-promise))
           (iter (apply (iter-lambda ,arglist
                          (aio-with-promise promise
                            ,@body))
                        args)))
       (prog1 promise
         (aio--step iter promise nil)))))

The body is evaluated inside aio-with-promise with the result delivered to the promise returned directly by the async function.

Before returning, the iterator is handed to aio--step, which drives the iterator forward until it delivers its first promise. When the iterator yields a promise, aio--step attaches a callback back to itself on the promise as described above. Immediately driving the iterator up to the first yielded promise “primes” it, which is important for getting the ball rolling on any asynchronous operations.

If the iterator ever yields something other than a promise, it’s delivered right back into the iterator.

(defun aio--step (iter promise yield-result)
  (condition-case _
      (cl-loop for result = (iter-next iter yield-result)
               then (iter-next iter (lambda () result))
               until (aio-promise-p result)
               finally (aio-listen result
                                   (lambda (value)
                                     (aio--step iter promise value))))
    (iter-end-of-sequence)))

When the iterator is done, nothing more needs to happen since the iterator resolves its own return value promise.

The definition of aio-defun just uses aio-lambda with defalias. There’s nothing to it.

That’s everything you need! Everything else in the package is merely useful, awaitable functions like aio-sleep and aio-timeout.

Composing promises

Unfortunately url-retrieve doesn’t support timeouts. We can work around this by composing two promises: a url-retrieve promise and aio-timeout promise. First define a promise-returning function, aio-select that takes a list of promises and returns (as another promise) the first promise to resolve:

(defun aio-select (promises)
  (let ((result (aio-promise)))
    (prog1 result
      (dolist (promise promises)
        (aio-listen promise (lambda (_)
                              (aio-resolve
                               result
                               (lambda () promise))))))))

We give aio-select both our url-retrieve and timeout promises, and it tells us which resolved first:

(aio-defun fetch-fortune-4 (url timeout)
  (let* ((promises (list (aio-url-retrieve url)
                         (aio-timeout timeout)))
         (fastest (aio-await (aio-select promises)))
         (buffer (aio-await fastest)))
    (with-current-buffer buffer
      (prog1 (buffer-string)
        (kill-buffer)))))

Cool! Note: This will not actually cancel the URL request, just move the async function forward earlier and prevent it from getting the result.

Threads

Despite aio being entirely about managing concurrent, asynchronous operations, it has nothing at all to do with threads — as in Emacs 26’s support for kernel threads. All async functions and promise callbacks are expected to run only on the main thread. That’s not to say an async function can’t await on a result from another thread. It just must be done very carefully.

Processes

The package also includes two functions for realizing promises on processes, whether they be subprocesses or network sockets.

aio-process-filter
aio-process-sentinel

For example, this function loops over each chunk of output (typically 4kB) from the process, as delivered to a filter function:

(aio-defun process-chunks (process)
  (cl-loop for chunk = (aio-await (aio-process-filter process))
           while chunk
           do (... process chunk ...)))

Exercise for the reader: Write an awaitable function that returns a line at at time rather than a chunk at a time. You can build it on top of aio-process-filter.

I considered wrapping functions like start-process so that their aio versions would return a promise representing some kind of result from the process. However there are so many different ways to create and configure processes that I would have ended up duplicating all the process functions. Focusing on the filter and sentinel, and letting the caller create and configure the process is much cleaner.

Unfortunately Emacs has no asynchronous API for writing output to a process. Both process-send-string and process-send-region will block if the pipe or socket is full. There is no callback, so you cannot await on writing output. Maybe there’s a way to do it with a dedicated thread?

Another issue is that the process-send-* functions are preemptible, made necessary because they block. The aio-process-* functions leave a gap (i.e. between filter awaits) where no filter or sentinel function is attached. It’s a consequence of promises being single-fire. The gap is harmless so long as the async function doesn’t await something else or get preempted. This needs some more thought.

Update: These process functions no longer exist and have been replaced by a small framework for building chains of promises. See aio-make-callback.

Testing aio

The test suite for aio is a bit unusual. Emacs’ built-in test suite, ERT, doesn’t support asynchronous tests. Furthermore, tests are generally run in batch mode, where Emacs invokes a single function and then exits rather than pump an event loop. Batch mode can only handle asynchronous process I/O, not the async functions of aio. So it’s not possible to run the tests in batch mode.

Instead I hacked together a really crude callback-based test suite. It runs in non-batch mode and writes the test results into a buffer (run with make check). Not ideal, but it works.

One of the tests is a sleep sort (with reasonable tolerances). It’s a pretty neat demonstration of what you can do with aio:

(aio-defun sleep-sort (values)
  (let ((promises (mapcar (lambda (v) (aio-sleep v v)) values)))
    (cl-loop while promises
             for next = (aio-await (aio-select promises))
             do (setf promises (delq next promises))
             collect (aio-await next))))

To see it in action (M-x sleep-sort-demo):

(aio-defun sleep-sort-demo ()
  (interactive)
  (let ((values '(0.1 0.4 1.1 0.2 0.8 0.6)))
    (message "%S" (aio-await (sleep-sort values)))))

Async/await is pretty awesome

I’m quite happy with how this all came together. Once I had the concepts straight — particularly resolving to value functions — everything made sense and all the parts fit together well, and mostly by accident. That feels good.

Emacs 26 Brings Generators and Threads

2018-05-31T17:45:16Z

Emacs 26.1 was recently released. As you would expect from a major release, it comes with lots of new goodies. Being a bit of an Emacs Lisp enthusiast, the two most interesting new features are generators (iter) and native threads (thread).

Correction: Generators were actually introduced in Emacs 25.1 (Sept. 2016), not Emacs 26.1. Doh!

Update: ThreadSanitizer (TSan) quickly shows that Emacs’ threading implementation has many data races, making it completely untrustworthy. Until this is fixed, nobody should use Emacs threads for any purpose, and threads should disabled at compile time.

Generators

Generators are one of those cool language features that provide a lot of power at a small implementation cost. They’re like a constrained form of coroutines, but, unlike coroutines, they’re typically built entirely on top of first-class functions (e.g. closures). This means no additional run-time support is needed in order to add generators to a language. The only complications are the changes to the compiler. Generators are not compiled the same way as normal functions despite looking so similar.

What’s perhaps coolest of all about lisp-family generators, including Emacs Lisp, is that the compiler component can be implemented entirely with macros. The compiler need not be modified at all, making generators no more than a library, and not actually part of the language. That’s exactly how they’ve been implemented in Emacs Lisp (emacs-lisp/generator.el).

So what’s a generator? It’s a function that returns an iterator object. When an iterator object is invoked (e.g. iter-next) it evaluates the body of the generator. Each iterator is independent. What makes them unusual (and useful) is that the evaluation is paused in the middle of the body to return a value, saving all the internal state in the iterator. Normally pausing in the middle of functions isn’t possible, which is what requires the special compiler support.

Emacs Lisp generators appear to be most closely modeled after Python generators, though it also shares some similarities to JavaScript generators. What makes it most like Python is the use of signals for flow control — something I’m not personally enthused about. When a Python generator completes, it throws a StopItertion exception. In Emacs Lisp, it’s an iter-end-of-sequence signal. A signal is out-of-band and avoids the issue relying on some special in-band value to communicate the end of iteration.

In contrast, JavaScript’s solution is to return a “rich” object wrapping the actual yield value. This object has a done field that communicates whether iteration has completed. This avoids the use of exceptions for flow control, but the caller has to unpack the rich object.

Fortunately the flow control issue isn’t normally exposed to Emacs Lisp code. Most of the time you’ll use the iter-do macro or (my preference) the new cl-loop keyword iter-by.

To illustrate how a generator works, here’s a really simple iterator that iterates over a list:

(iter-defun walk (list)
  (while list
    (iter-yield (pop list))))

Here’s how it might be used:

(setf i (walk '(:a :b :c)))

(iter-next i)  ; => :a
(iter-next i)  ; => :b
(iter-next i)  ; => :c
(iter-next i)  ; error: iter-end-of-sequence

The iterator object itself is opaque and you shouldn’t rely on any part of its structure. That being said, I’m a firm believer that we should understand how things work underneath the hood so that we can make the most effective use of at them. No program should rely on the particulars of the iterator object internals for correctness, but a well-written program should employ them in a way that best exploits their expected implementation.

Currently iterator objects are closures, and iter-next invokes the closure with its own internal protocol. It asks the closure to return the next value (:next operation), and iter-close asks it to clean itself up (:close operation).

Since they’re just closures, another really cool thing about Emacs Lisp generators is that iterator objects are generally readable. That is, you can serialize them out with print and bring them back to life with read, even in another instance of Emacs. They exist independently of the original generator function. This will not work if one of the values captured in the iterator object is not readable (e.g. buffers).

How does pausing work? Well, one of other exciting new features of Emacs 26 is the introduction of a jump table opcode, switch. I’d lamented in the past that large cond and cl-case expressions could be a lot more efficient if Emacs’ byte code supported jump tables. It turns an O(n) sequence of comparisons into an O(1) lookup and jump. It’s essentially the perfect foundation for a generator since it can be used to jump straight back to the position where evaluation was paused.

Buuut, generators do not currently use jump tables. The generator library predates the new switch opcode, and, being independent of it, its author, Daniel Colascione, went with the best option at the time. Chunks of code between yields are packaged as individual closures. These closures are linked together a bit like nodes in a graph, creating a sort of state machine. To get the next value, the iterator object invokes the closure representing the next state.

I’ve manually macro expanded the walk generator above into a form that roughly resembles the expansion of iter-defun:

(defun walk (list)
  (let (state)
    (cl-flet* ((state-2 ()
                 (signal 'iter-end-of-sequence nil))
               (state-1 ()
                 (prog1 (pop list)
                   (when (null list)
                     (setf state #'state-2))))
               (state-0 ()
                 (if (null list)
                     (state-2)
                   (setf state #'state-1)
                   (state-1))))
      (setf state #'state-0)
      (lambda ()
        (funcall state)))))

This omits the protocol I mentioned, and it doesn’t have yield results (values passed to the iterator). The actual expansion is a whole lot messier and less optimal than this, but hopefully my hand-rolled generator is illustrative enough. Without the protocol, this iterator is stepped using funcall rather than iter-next.

The state variable keeps track of where in the body of the generator this iterator is currently “paused.” Continuing the iterator is therefore just a matter of invoking the closure that represents this state. Each state closure may update state to point to a new part of the generator body. The terminal state is obviously state-2. Notice how state transitions occur around branches.

I had said generators can be implemented as a library in Emacs Lisp. Unfortunately theres a hole in this: unwind-protect. It’s not valid to yield inside an unwind-protect form. Unlike, say, a throw-catch, there’s no mechanism to trap an unwinding stack so that it can be restarted later. The state closure needs to return and fall through the unwind-protect.

A jump table version of the generator might look like the following. I’ve used cl-labels since it allows for recursion.

(defun walk (list)
  (let ((state 0))
    (cl-labels
        ((closure ()
           (cl-case state
             (0 (if (null list)
                    (setf state 2)
                  (setf state 1))
                (closure))
             (1 (prog1 (pop list)
                  (when (null list)
                    (setf state 2))))
             (2 (signal 'iter-end-of-sequence nil)))))
      #'closure)))

When byte compiled on Emacs 26, that cl-case is turned into a jump table. This “switch” form is closer to how generators are implemented in other languages.

Iterator objects can share state between themselves if they close over a common environment (or, of course, use the same global variables).

(setf foo
      (let ((list '(:a :b :c)))
        (list
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list)))))
         (funcall
          (iter-lambda ()
            (while list
              (iter-yield (pop list))))))))

(iter-next (nth 0 foo))  ; => :a
(iter-next (nth 1 foo))  ; => :b
(iter-next (nth 0 foo))  ; => :c

For years there has been a very crude way to “pause” a function and allow other functions to run: accept-process-output. It only works in the context of processes, but five years ago this was sufficient for me to build primitives on top of it. Unlike this old process function, generators do not block threads, including the user interface, which is really important.

Threads

Emacs 26 also bring us threads, which have been attached in a very bolted on fashion. It’s not much more than a subset of pthreads: shared memory threads, recursive mutexes, and condition variables. The interfaces look just like they do in pthreads, and there hasn’t been much done to integrate more naturally into the Emacs Lisp ecosystem.

This is also only the first step in bringing threading to Emacs Lisp. Right now there’s effectively a global interpreter lock (GIL), and threads only run one at a time cooperatively. Like with generators, the Python influence is obvious. In theory, sometime in the future this interpreter lock will be removed, making way for actual concurrency.

This is, again, where I think it’s useful to contrast with JavaScript, which was also initially designed to be single-threaded. Low-level threading primitives weren’t exposed — though mostly because JavaScript typically runs sandboxed and there’s no safe way to expose those primitives. Instead it got a web worker API that exposes concurrency at a much higher level, along with an efficient interface for thread coordination.

For Emacs Lisp, I’d prefer something safer, more like the JavaScript approach. Low-level pthreads are now a great way to wreck Emacs with deadlocks (with no C-g escape). Playing around with the new threading API for just a few days, I’ve already had to restart Emacs a bunch of times. Bugs in Emacs Lisp are normally a lot more forgiving.

One important detail that has been designed well is that dynamic bindings are thread-local. This is really essential for correct behavior. This is also an easy way to create thread-local storage (TLS): dynamically bind variables in the thread’s entrance function.

;;; -*- lexical-binding: t; -*-

(defvar foo-counter-tls)
(defvar foo-path-tls)

(defun foo-make-thread (path)
  (make-thread
   (lambda ()
     (let ((foo-counter-tls 0)
           (foo-name-tls path))
       ...))))

However, cl-letf “bindings” are not thread-local, which makes this otherwise incredibly useful macro quite dangerous in the presence of threads. This is one way that the new threading API feels bolted on.

Building generators on threads

In my stack clashing article I showed a few different ways to add coroutine support to C. One method spawned per-coroutine threads, and coordinated using semaphores. With the new threads API in Emacs, it’s possible to do exactly the same thing.

Since generators are just a limited form of coroutines, this means threads offer another, very different way to implement them. The threads API doesn’t provide semaphores, but condition variables can fill in for them. To “pause” in the middle of the generator, just wait on a condition variable.

So, naturally, I just had to see if I could make it work. I call it a “thread iterator” or “thriter.” The API is very similar to iter:

https://github.com/skeeto/thriter

This is merely a proof of concept so don’t actually use this library for anything. These thread-based generators are about 5x slower than iter generators, and they’re a lot more heavy-weight, needing an entire thread per iterator object. This makes thriter-close all the more important. On the other hand, these generators have no problem yielding inside unwind-protect.

Originally this article was going to dive into the details of how these thread-iterators worked, but thriter turned out to be quite a bit more complicated than I anticipated, especially as I worked towards feature matching iter.

The gist of it is that each side of a next/yield transaction gets its own condition variable, but share a common mutex. Values are passed between the threads using slots on the iterator object. The side that isn’t currently running waits on a condition variable until the other side frees it, after which the releaser waits on its own condition variable for the result. This is similar to asynchronous requests in Emacs dynamic modules.

Rather than use signals to indicate completion, I modeled it after JavaScript generators. Iterators return a cons cell. The car indicates continuation and the cdr holds the yield result. To terminate an iterator early (thriter-close or garbage collection), thread-signal is used to essentially “cancel” the thread and knock it off the condition variable.

Since threads aren’t (and shouldn’t be) garbage collected, failing to run a thread-iterator to completion would normally cause a memory leak, as the thread sits there forever waiting on a “next” that will never come. To deal with this, there’s a finalizer is attached to the iterator object in such a way that it’s not visible to the thread. A lost iterator is eventually cleaned up by the garbage collector, but, as usual with finalizers, this is only a last resort.

The future of threads

This thread-iterator project was my initial, little experiment with Emacs Lisp threads, similar to why I connected a joystick to Emacs using a dynamic module. While I don’t expect the current thread API to go away, it’s not really suitable for general use in its raw form. Bugs in Emacs Lisp programs should virtually never bring down Emacs and require a restart. Outside of threads, the few situations that break this rule are very easy to avoid (and very obvious that something dangerous is happening). Dynamic modules are dangerous by necessity, but concurrency doesn’t have to be.

There really needs to be a safe, high-level API with clean thread isolation. Perhaps this higher-level API will eventually build on top of the low-level threading API.

RSA Signatures in Emacs Lisp

2015-10-30T22:35:13Z

Emacs comes with a wonderful arbitrary-precision computer algebra system called calc. I’ve discussed it previously and continue to use it on a daily basis. That’s right, people, Emacs can do calculus. Like everything Emacs, it’s programmable and extensible from Emacs Lisp. In this article, I’m going to implement the RSA public-key cryptosystem in Emacs Lisp using calc.

If you want to dive right in first, here’s the repository:

https://github.com/skeeto/emacs-rsa

This is only a toy implementation and not really intended for serious cryptographic work. It’s also far too slow when using keys of reasonable length.

Evaluation with calc

The calc package is particularly useful when considering Emacs’ limited integer type. Emacs uses a tagged integer scheme where integers are embedded within pointers. It’s a lot faster than the alternative (individually-allocated integer objects), but it means they’re always a few bits short of the platform’s native integer type.

calc has a large API, but the user-friendly porcelain for it is the under-documented calc-eval function. It evaluates an expression string with format-like argument substitutions ($n).

(calc-eval "2^16 - 1")
;; => "65535"

(calc-eval "2^$1 - 1" nil 128)
;; => "340282366920938463463374607431768211455"

Notice it returns strings, which is one of the ways calc represents arbitrary precision numbers. For arguments, it accepts regular Elisp numbers and strings just like this function returns. The implicit radix is 10. To explicitly set the radix, prefix the number with the radix and #. This is the same as in the user interface of calc. For example:

(calc-eval "16#deadbeef")
;; => "3735928559"

The second argument (optional) to calc-eval adjusts its behavior. Given nil, it simply evaluates the string and returns the result. The manual documents the different options, but the only other relevant option for RSA is the symbol pred, which asks it to return a boolean “predicate” result.

(calc-eval "$1 < $2" 'pred "4000" "5000")
;; => t

Generating primes

RSA is founded on the difficulty of factoring large composites with large factors. Generating an RSA keypair starts with generating two prime numbers, p and q, and using these primes to compute two mathematically related composite numbers.

calc has a function calc-next-prime for finding the next prime number following any arbitrary number. It uses a probabilistic primarily test — the ~~Fermat~~ Miller-Rabin primality test — to efficiently test large integers. It increments the input until it finds a result that passes enough iterations of the primality test.

(calc-eval "nextprime($1)" nil "100000000000000000")
;; => "100000000000000003"

So to generate a random n-bit prime, first generate a random n-bit number and then increment it until a prime number is found.

;; Generate a 128-bit prime, 10 iterations (0.000084% error rate)
(calc-eval "nextprime(random(2^$1), 10)" nil 128)
"111618319598394878409654851283959105123"

Unfortunately calc’s random function is based on Emacs’ random function, which is entirely unsuitable for cryptography. In the real implementation I read n bits from /dev/urandom to generate an n-bit number.

(with-temp-buffer
  (set-buffer-multibyte nil)
  (call-process "head" "/dev/urandom" t nil "-c" (format "%d" (/ bits 8)))
  (let ((f (apply-partially #'format "%02x")))
    (concat "16#" (mapconcat f (buffer-string) ""))))

(Note: /dev/urandom is the right choice. There’s no reason to use /dev/random for generating keys.)

Computing e and d

From here the code just follows along from the Wikipedia article. After generating the primes p and q, two composites are computed, n = p * q and i = (p - 1) * (q - 1). Lacking any reason to do otherwise, I chose 65,537 for the public exponent e.

The function rsa--inverse is just a straight Emacs Lisp + calc implementation of the extended Euclidean algorithm from the Wikipedia article pseudocode, computing d ≡ e^-1 (mod i). It’s not much use sharing it here, so take a look at the repository if you’re curious.

(defun rsa-generate-keypair (bits)
  "Generate a fresh RSA keypair plist of BITS length."
  (let* ((p (rsa-generate-prime (+ 1 (/ bits 2))))
         (q (rsa-generate-prime (+ 1 (/ bits 2))))
         (n (calc-eval "$1 * $2" nil p q))
         (i (calc-eval "($1 - 1) * ($2 - 1)" nil p q))
         (e (calc-eval "2^16+1"))
         (d (rsa--inverse e i)))
    `(:public  (:n ,n :e ,e) :private (:n ,n :d ,d))))

The public key is n and e and the private key is n and d. From here we can compute and verify cryptographic signatures.

Signatures

To compute signature s of an integer m (where m < n), compute s ≡ m^d (mod n). I chose the right-to-left binary method, again straight from the Wikipedia pseudocode (lazy!). I’ll share this one since it’s short. The backslash denotes integer division.

(defun rsa--mod-pow (base exponent modulus)
  (let ((result 1))
    (setf base (calc-eval "$1 % $2" nil base modulus))
    (while (calc-eval "$1 > 0" 'pred exponent)
      (when (calc-eval "$1 % 2 == 1" 'pred exponent)
        (setf result (calc-eval "($1 * $2) % $3" nil result base modulus)))
      (setf exponent (calc-eval "$1 \\ 2" nil exponent)
            base (calc-eval "($1 * $1) % $2" nil base modulus)))
    result))

Verifying the signature is the same process, but with the public key’s e: m ≡ s^e (mod n). If the signature is valid, m will be recovered. In theory, only someone who knows d can feasibly compute s from m. If n is small enough to factor, revealing p and q, then d can be feasibly recomputed from the public key. So mind your Ps and Qs.

So that leaves one problem: generally users want to sign strings and files and such, not integers. A hash function is used to reduce an arbitrary quantity of data into an integer suitable for signing. Emacs comes with a bunch of them, accessible through secure-hash. It hashes strings and buffers.

(secure-hash 'sha224 "Hello, world!")
;; => "8552d8b7a7dc5476cb9e25dee69a8091290764b7f2a64fe6e78e9568"

Since the result is hexadecimal, just prefix 16# to turn it into a calc integer.

Here’s the signature and verification functions. Any string or buffer can be signed.

(defun rsa-sign (private-key object)
  (let ((n (plist-get private-key :n))
        (d (plist-get private-key :d))
        (hash (concat "16#" (secure-hash 'sha384 object))))
    ;; truncate hash such that hash < n
    (while (calc-eval "$1 > $2" 'pred hash n)
      (setf hash (calc-eval "$1 \\ 2" nil hash)))
    (rsa--mod-pow hash d n)))

(defun rsa-verify (public-key object sig)
  (let ((n (plist-get public-key :n))
        (e (plist-get public-key :e))
        (hash (concat "16#" (secure-hash 'sha384 object))))
    ;; truncate hash such that hash < n
    (while (calc-eval "$1 > $2" 'pred hash n)
      (setf hash (calc-eval "$1 \\ 2" nil hash)))
    (let* ((result (rsa--mod-pow sig e n)))
      (calc-eval "$1 == $2" 'pred result hash))))

Note the hash truncation step. If this is actually necessary, then your n is very easy to factor! It’s in there since this is just a toy and I want it to work with small keys.

Putting it all together

Here’s the whole thing in action with an extremely small, 128-bit key.

(setf message "hello, world!")

(setf keypair (rsa-generate-keypair 128))
;; => (:public  (:n "74924929503799951536367992905751084593"
;;               :e "65537")
;;     :private (:n "74924929503799951536367992905751084593"
;;               :d "36491277062297490768595348639394259869"))

(setf sig (rsa-sign (plist-get keypair :private) message))
;; => "31982247477262471348259501761458827454"

(rsa-verify (plist-get keypair :public) message sig)
;; => t

(rsa-verify (plist-get keypair :public) (capitalize message) sig)
;; => nil

Each of these operations took less than a second. For larger, secure-length keys, this implementation is painfully slow. For example, generating a 2048-bit key takes my laptop about half an hour, and computing a signature with that key (any size message) takes about a minute. That’s probably a little too slow for, say, signing ELPA packages.

Emacs Lisp Defstruct Namespace Convention

2014-03-19T01:41:52Z

One of the drawbacks of Emacs Lisp is the lack of namespaces. Every defun, defvar, defcustom, defface, defalias, defstruct, and defclass establishes one or more names in the global scope. To work around this, package authors are strongly encouraged to prefix every global name with the name of its package. That way there should never be a naming conflict between two different packages.

(defvar mypackage-foo-limit 10)

(defvar mypackage--bar-counter 0)

(defun mypackage-init ()
  ...)

(defun mypackage-compute-children (node)
  ...)

(provide 'mypackage)

While this has solved the problem for the time being, attaching the package name to almost every identifier, including private function and variable names, is quite cumbersome. Namespaces can almost be hacked into the language by using multiple obarrays, but symbols have internal linked lists that prohibit inclusion in multiple obarrays.

By convention, private names are given a double-dash after the namespace. If a “bar counter” is an implementation detail that may disappear in the future, it will be called mypackage--bar-counter to warn users and other package authors not to rely on it.

There’s been a recent push to follow this namespace-prefix policy more strictly, particularly with the depreciation of cl and introduction of cl-lib. I suspect someday when namespaces are finally introduced, packages with strictly clean namespaces with be at an advantage, somehow automatically supported. Nic Ferrier has proposed ideas for how to move forward on this.

How strict are we talking?

Over the last few years I’ve gotten much stricter in my own packages when it comes to namespace prefixes. You can see the progression going from javadoc-lookup (2010) where I was completely sloppy about it, to EmacSQL (2014) where every single global identifier is meticulously prefixed.

For a time I considered names such as make-* and with-* to be exceptions to the rule, since these names are idioms inherited from Common Lisp. The namespace comes after the expected prefix. I’ve changed my mind about this, which has caused me to change my usage of defstruct (now cl-defstruct).

Just as in Common Lisp, by default cl-defstruct defines a constructor starting with make-*. This is fine in Common Lisp, where it’s a package-private function by default, but in Emacs Lisp this pollutes the global namespace.

(require 'cl-lib)

;; Defines make-circle, circle-x, circle-y, circle-radius, circle-p
(cl-defstruct circle
  x y radius)

(defvar unit-circle (make-circle :x 0.0 :y 0.0 :radius 1.0))

unit-circle
;; => [cl-struct-circle 0.0 0.0 1.0]

(circle-radius unit-circle)
;; => 1.0

This constructor isn’t namespace clean, so package authors should avoid defstruct’s default. If the package is named circle then all of the accessors are perfectly fine, though.

To fix this, I now use another, more recent Emacs Lisp idiom: name the constructor create. That is, for the package circle, we desire circle-create. To get this behavior from cl-defstruct, use the :constructor option.

;; Clean!
(cl-defstruct (circle (:constructor circle-create))
  x y radius)

(circle-create :x 0 :y 0 :radius 1)
;; => [cl-struct-circle 0 0 1]

(provide 'circle)

This affords a new opportunity to craft a better constructor. Have cl-defstruct define a private constructor, then manually write a constructor with a nicer interface. It may also do additional work, like enforce invariants or initialize dependent slots.

(cl-defstruct (circle (:constructor circle--create))
  x y radius)

(defun circle-create (x y radius)
  (let ((circle (circle--create :x x :y y :radius radius)))
    (if (< radius 0)
        (error "must have non-negative radius")
      circle)))

(circle-create 0 0 1)
;; => [cl-struct-circle 0 0 1]

(circle-create 0 0 -1)
;; error: "must have non-negative radius"

This is now how I always use cl-defstruct in Emacs Lisp. It’s a tidy convention that will probably become more common in the future.

Emacs Byte-code Internals

2014-01-04T05:07:26Z

Byte-code compilation is an underdocumented — and in the case of the recent lexical binding updates, undocumented — part of Emacs. Most users know that Elisp is usually compiled into a byte-code saved to .elc files, and that byte-code loads and runs faster than uncompiled Elisp. That’s all users really need to know, and the GNU Emacs Lisp Reference Manual specifically discourages poking around too much.

People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity.

Screw that! What if I want to handcraft some byte-code myself? :-) The purpose of this article is to introduce the internals of Elisp byte-code interpreter. I will explain how it works, why lexically scoped code is faster, and demonstrate writing some byte-code by hand.

The Humble Stack Machine

The byte-code interpreter is a simple stack machine. The stack holds arbitrary lisp objects. The interpreter is backwards compatible but not forwards compatible (old versions can’t run new byte-code). Each instruction is between 1 and 3 bytes. The first byte is the opcode and the second and third bytes are either a single operand or a single intermediate value. Some operands are packed into the opcode byte.

As of this writing (Emacs 24.3) there are 142 opcodes, 6 of which have been declared obsolete. Most opcodes refer to commonly used built-in functions for fast access. (Looking at the selection, Elisp really is geared towards text!) Considering packed operands, there are up to 27 potential opcodes unused, reserved for the future.

opcodes 48 - 55
opcode 97
opcode 128
opcodes 169 - 174
opcodes 180 - 181
opcodes 183 - 191

The easiest place to access the opcode listing is in bytecomp.el. Beware that some of the opcode comments are currently out of date.

Segmentation Fault Warning

Byte-code does not offer the same safety as normal Elisp. Bad byte-code can, and will, cause Emacs to crash. You can try out for yourself right now,

emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'

Or evaluate the code manually in a buffer (save everything first!),

(#[0 "\300\207" [] 0])

This segfault, caused by referencing beyond the end of the constants vector, is not an Emacs bug. Doing a boundary test would slow down the byte-code interpreter. Not performing this test at run-time is a practical engineering decision. The Emacs developers have instead chosen to rely on valid byte-code output from the compiler, making a disclaimer to anyone wanting to write their own byte-code,

You should not try to come up with the elements for a byte-code function yourself, because if they are inconsistent, Emacs may crash when you call the function. Always leave it to the byte compiler to create these objects; it makes the elements consistent (we hope).

You’ve been warned. Now it’s time to start playing with firecrackers.

The Byte-code Object

A byte-code object is functionally equivalent to a normal Elisp vector except that it can be evaluated as a function. Elements are accessed in constant time, the syntax is similar to vector syntax ([...] vs. #[...]), and it can be of any length, though valid functions must have at least 4 elements.

There are two ways to create a byte-code object: using a byte-code object literal or with make-byte-code. Like vector literals, byte-code literals don’t need to be quoted.

(make-byte-code 0 "" [] 0)
;; => #[0 "" [] 0]

#[1 2 3 4]
;; => #[1 2 3 4]

(#[0 "" [] 0])
;; error: Invalid byte opcode

The elements of an object literal are:

Function parameter (lambda) list
Unibyte string of byte-code
Constants vector
Maximum stack usage
Docstring (optional, nil for none)
Interactive specification (optional)

Parameter List

The parameter list takes on two different forms depending on if the function is lexically or dynamically scoped. If the function is dynamically scoped, the argument list is exactly what appears in lisp code.

(byte-compile (lambda (a b &optional c)))
;; => #[(a b &optional c) "\300\207" [nil] 1]

There’s really no shorter way to represent the parameter list because preserving the argument names is critical. Remember that, in dynamic scope, while the function body is being evaluated these variables are globally bound (eww!) to the function’s arguments.

When the function is lexically scoped, the parameter list is packed into an Elisp integer, indicating the counts of the different kinds of parameters: required, &optional, and &rest.

The least significant 7 bits indicate the number of required arguments. Notice that this limits compiled, lexically-scoped functions to 127 required arguments. The 8th bit is the number of &rest arguments (up to 1). The remaining bits indicate the total number of optional and required arguments (not counting &rest). It’s really easy to parse these in your head when viewed as hexadecimal because each portion almost always fits inside its own “digit.”

(byte-compile-make-args-desc '())
;; => #x000  (0 args, 0 rest, 0 required)

(byte-compile-make-args-desc '(a b))
;; => #x202  (2 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c))
;; => #x302  (3 args, 0 rest, 2 required)

(byte-compile-make-args-desc '(a b &optional c &rest d))
;; => #x382  (3 args, 1 rest, 2 required)

The names of the arguments don’t matter in lexical scope: they’re purely positional. This tighter argument specification is one of the reasons lexical scope is faster: the byte-code interpreter doesn’t need to parse the entire lambda list and assign all of the variables on each function invocation.

Unibyte String Byte-code

The second element is a unibyte string — it strictly holds octets and is not to be interpreted as any sort of Unicode encoding. These strings should be created with unibyte-string because string may return a multibyte string. To disambiguate the string type to the lisp reader when higher values are present (> 127), the strings are printed in an escaped octal notation, keeping the string literal inside the ASCII character set.

(unibyte-string 100 200 250)
;; => "d\310\372"

It’s unusual to see a byte-code string that doesn’t end with 135 (#o207, byte-return). Perhaps this should have been implicit? I’ll talk more about the byte-code below.

Constants Vector

The byte-code has very limited operands. Most operands are only a few bits, some fill an entire byte, and occasionally two bytes. The meat of the function that holds all the constants, function symbols, and variables symbols is the constants vector. It’s a normal Elisp vector and can be created with vector or a vector literal. Operands reference either this vector or they index into the stack itself.

(byte-compile (lambda (a b) (my-func b a)))
;; => #[(a b) "\302\134\011\042\207" [b a my-func] 3]

Note that the constants vector lists the variable symbols as well as the external function symbol. If this was a lexically scoped function the constants vector wouldn’t have the variables listed, being only [my-func].

Maximum Stack Usage

This is the maximum stack space used by this byte-code. This value can be derived from the byte-code itself, but it’s pre-computed so that the byte-code interpreter can quickly check for stack overflow. Under-reporting this value is probably another way to crash Emacs.

Docstring

The simplest component and completely optional. It’s either the docstring itself, or if the docstring is especially large it’s a cons cell indicating a compiled .elc and a position for lazy access. Only one position, the start, is needed because the lisp reader is used to load it and it knows how to recognize the end.

Interactive Specification

If this element is present and non-nil then the function is an interactive function. It holds the exactly contents of interactive in the uncompiled function definition.

(byte-compile (lambda (n) (interactive "nNumber: ") n))
;; => #[(n) "\010\207" [n] 1 nil "nNumber: "]

(byte-compile (lambda (n) (interactive (list (read))) n))
;; => #[(n) "\010\207" [n] 1 nil (list (read))]

The interactive expression is always interpreted, never byte-compiled. This is usually fine because, by definition, this code is going to be waiting on user input. However, it slows down keyboard macro playback.

Opcodes

The bulk of the established opcode bytes is for variable, stack, and constant access opcodes, most of which use packed operands.

0 - 7 : (stack-ref) stack reference
8 - 15 : (varref) variable reference (from constants vector)
16 - 23 : (varset) variable set (from constants vector)
24 - 31 : (varbind) variable binding (from constants vector)
32 - 39 : (call) function call (immediate = number of arguments)
40 - 47 : (unbind) variable unbinding (from constants vector)
129, 192-255 : (constant) direct constants vector access

Except for the last item, each kind of instruction comes in sets of 8. The nth such instruction means access the nth thing. For example, the instruction “2” copies the third stack item to the top of the stack. An instruction of “9” pushes onto the stack the value of the variable named by the second element listed in the constants vector.

However, the 7th and 8th such instructions in each set take an operand byte or two. The 7th instruction takes a 1-byte operand and the 8th takes a 2-byte operand. A 2-byte operand is written in little-endian byte-order regardless of the host platform.

For example, let’s manually craft an instruction that returns the value of the global variable foo. Each opcode has a named constant of byte-X so we don’t have to worry about their actual byte-code number.

(require 'bytecomp)  ; named opcodes

(defvar foo "hello")

(defalias 'get-foo
  (make-byte-code
    #x000                 ; no arguments
    (unibyte-string
      (+ 0 byte-varref)   ; ref variable under first constant
      byte-return)        ; pop and return
    [foo]                 ; constants
    1))                   ; only using 1 stack space

(get-foo)
;; => "hello"

Ta-da! That’s a handcrafted byte-code function. I left a “+ 0” in there so that I can change the offset. This function has the exact same behavior, it’s just less optimal,

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 3 byte-varref)     ; 4th form of varref
      byte-return)
    [nil nil nil foo]
    1))

If foo was the 10th constant, we would need to use the 1-byte operand version. Again, the same behavior, just less optimal.

(defalias 'get-foo
  (make-byte-code
    #x000
    (unibyte-string
      (+ 6 byte-varref)     ; 7th form of varref
      9                     ; operand, (constant index 9)
      byte-return)
    [nil nil nil nil nil nil nil nil nil foo]
    1))

Dynamically-scoped code makes heavy use of varref but lexically-scoped code rarely uses it (global variables only), instead relying heavily on stack-ref, which is faster. This is where the different calling conventions come into play.

Calling Convention

Each kind of scope gets its own calling convention. Here we finally get to glimpse some of the really great work by Stefan Monnier updating the compiler for lexical scope.

Dynamic Scope Calling Convention

Remembering back to the parameter list element of the byte-code object, dynamically scoped functions keep track of all its argument names. Before executing a function the interpreter examines the lambda list and binds (varbind) every variable globally to an argument.

If the caller was byte-compiled, each argument started on the stack, was popped and bound to a variable, and, to be accessed by the function, will be pushed back right onto the stack (varref). There’s a lot of argument indirection for each function call.

Lexical Scope Calling Convention

With lexical scope, the argument names are not actually bound for the evaluation byte-code. The names are completely gone because the compiler has converted local variables into stack offsets.

When calling a lexically-scoped function, the byte-code interpreter examines the integer parameter descriptor. It checks to make sure the appropriate number of arguments have been provided, and for each unprovided &optional argument it pushes a nil onto the stack. If the function has a &rest parameter, any extra arguments are popped off into a list and that list is pushed onto the stack.

From here the function can access its arguments directly on the stack without any named variable misdirection. It can even consume them directly.

;; -*- lexical-binding: t -*-
(defun foo (x) x)

(symbol-function #'foo)
;; => #[#x101 "\207" [] 2]

The byte-code for foo is a single instruction: return. The function’s argument is already on the stack so it doesn’t have to do anything. Strangely the maximum stack usage element is wrong here (2), but it won’t cause a crash.

;; (As of this writing `byte-compile' always uses dynamic scope.)

(byte-compile 'foo)
;; => #[(x) "\010\207" [x] 1]

It takes longer to set up (x is implicitly bound), it has to make an explicit variable dereference (varref), then it has to clean up by unbinding x (implicit unbind). It’s no wonder lexical scope is faster!

Note that there’s also a disassemble function for examining byte-code, but it only reveals part of the story.

(disassemble #'foo)
;; byte code:
;;   args: (x)
;; 0       varref    x
;; 1       return

Compiler Intermediate “lapcode”

The Elisp byte-compiler has an intermediate language called lapcode (“Lisp Assembly Program”), which is much easier to optimize than byte-code. It’s basically an assembly language built out of s-expressions. Opcodes are referenced by name and operands, including packed operands, are handled whole. Each instruction is a cons cell, (opcode . operand), and a program is a list of these.

Let’s rewrite our last get-foo using lapcode.

(defalias 'get-foo
  (make-byte-code
    #x000
    (byte-compile-lapcode
      '((byte-varref . 9)
        (byte-return)))
    [nil nil nil nil nil nil nil nil nil foo]
    1))

We didn’t have to worry about which form of varref we were using or even how to encode a 2-byte operand. The lapcode “assembler” took care of that detail.

Project Ideas?

The Emacs byte-code compiler and interpreter are fascinating. Having spent time studying them I’m really tempted to build a project on top of it all. Perhaps implementing a programming language that targets the byte-code interpreter, improving compiler optimization, or, for a really big project, JIT compiling Emacs byte-code.

People can write byte-code!

Emacs Lisp Readable Closures

2013-12-30T23:52:38Z

I’ve stated before that one of the unique features of Emacs Lisp is that its closures are readable. Closures can be serialized by the printer and read back in with the reader. I am unaware of any other programming language that has this feature. In fact it’s essential for Elisp byte-code compilation because byte-compiled Elisp files are merely s-expressions of byte-code dumped out as source.

Lisp Printing

The Lisp family of languages are homoiconic. Lisp source code is written in the syntax of its own data structures, s-expressions. Since a compiler/interpreter is usually provided at run-time, a consequence of this is that reading and printing are a fundamental feature of Lisps. A value can be handed to the printer, which will serialize the value into an s-expression as a sequence of characters. Later on the reader can parse the s-expression back into an equal value.

To compare, JavaScript originally had half of this in place. JavaScript has convenient object syntax for defining an associative array, known today as JSON. The eval function could (dangerously) be used as a reader for parsing a string containing JSON-encoded data into a value. But until JSON.stringify() became standard, developers had to write their own printer. Lisp s-expression syntax is much more powerful (and complicated) than JSON, maintaining both identity and cycles (e.g. *print-circle*).

Not all values can be read. They’ll still print (when *print-readably* is nil) but will do so using special syntax that will signal an error in the reader: #<. For example, in Emacs Lisp buffers cannot be serialized so they print using this syntax.

(prin1-to-string (current-buffer))
;; => "#"

It doesn’t matter what’s between the angle brackets, or even that there’s a closing angle bracket. The reader will signal an error as soon as it hits a #<.

Almost Everything Prints Readably

Elisp has a small set of primitive data types. All of these primitive types print readably:

integer (1024, ?a)
float (1.7)
cons/list ((...))
vector (one-dimensional, [...])
bool-vector (#&n"...")
string ("...")
char-table (#^[...])
hash-table (readable as of Emacs 23.3, #s(hash-table ...))
byte-code function object (#[...])
symbol

Here are all the non-readable types. Each one has a good reason for not being serializable.

buffer
process (external state)
frame (user interface element)
marker (live, automatically updates)
overlay (belongs to a buffer)
built-in functions (native code)
user-ptr (opaque pointers from Emacs 25 dynamic modules)

And that’s it. Every other value in Elisp is constructed from one or more of these primitives, including keymaps, functions, macros, syntax tables, defstruct structs, and EIEIO objects. This means that as long as these values don’t refer to an unreadable value, they themselves can be printed.

An interesting note here is that, unlike the Common Lisp Object System (CLOS), EIEIO objects are readable by default. To Elisp they’re just vectors, so of course they print. CLOS objects are unreadable without manually defining a print method per class.

Elisp Closures

Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now one of the relatively few languages to have both dynamic and lexical scope. Like Common Lisp, variables declared with defvar (and family) continue to have dynamic scope. For backwards compatibility with old Lisp code, lexical scope is disabled by default. It’s enabled for a specific file or buffer by setting lexical-binding to non-nil.

With lexical scope, anonymous functions become closures, a powerful functional programming primitive: a function plus a captured lexical environment. It also provides some performance benefits. In my own tests, compiled Elisp with lexical scope enabled is about 10% to 15% faster than with the default dynamic scope.

What do closures look like in Emacs Lisp? It takes on two forms depending on whether the closure is compiled or not. For example, consider this function, foo, that takes two arguments and returns a closure that returns the first argument.

;; -*- lexical-binding: t; -*-
(defun foo (x y)
  (lambda () x))

(foo :bar :ignored)
;; => (closure ((y . :ignored) (x . :bar) t) () x)

An uncompiled closure is a list beginning with the symbol closure. The second element is the lexical environment, the third is the argument list (lambda list), and the rest is the body of the function. Here we can see that both x and y have been “closed over.” This is a little bit sloppy because the function never makes use of y. Capturing it has a few problems.

The closure has a larger footprint than necessary.
Values are held longer than necessary, delaying collection.
It affects the readability of the closure, which I’ll get to later.

Fortunately the compiler is smart enough to see this and will avoid capturing unused variables. To prove this, I’ve now compiled foo so that it returns a compiled closure.

(foo :bar :ignored)
;; => #[0 "\300\207" [:bar] 1]

What’s returned here is a byte-code function object, with the #[...] syntax. It has these elements:

The function’s lambda list (zero arguments)
Byte-codes stored in a unibyte string
Constants vector
Maximum stack space needed by this function

Notice that the lexical environment has been captured in the constants vector, specifically noting the lack of :ignored in this vector. The compiler didn’t capture it.

For those curious about the byte-code here’s an explanation. The string syntax shown is in octal, representing a string containing two bytes: 192 and 135. The Elisp byte-code interpreter is stack-based. The 192 (constant 0) says to push the first constant onto the stack. The 135 (return) says to pop the top element from the stack and return it.

(coerce "\300\207" 'list)
;; => (192 135)

The Readable Closures Catch

Since closures are byte-code function objects, they print readably. You can capture an environment in a closure, serialize it, read it back in, and evaluate it. That’s pretty cool! This means closures can be transmitted to other Emacs instances in a multi-processing setup (i.e. Elnode, Async)

The catch is that it’s easy to accidentally capture an unreadable value, especially buffers. Consider this function bar which uses a temporary buffer as an efficient string builder. It returns a closure that returns the result. (Weird, but stick with me here!)

(defun bar (n)
  (with-temp-buffer
    (let ((standard-output (current-buffer)))
      (loop for i from 0 to n do (princ i))
      (let ((string (buffer-string)))
        (lambda () string)))))

The compiled form looks fine,

(bar 3)
;; => #[0 "\300\207" ["0123"] 1]

But the interpreted form of the closure has a problem. The with-temp-buffer macro silently introduced a new binding — an abstraction leak.

(bar 3)
;; => (closure ((string . "0123")
;;              (temp-buffer . #)
;;              (n . 3) t)
;;      () string)

The temporary buffer is mistakenly captured in the closure making it unreadable, but only in its uncompiled form. This creates the awkward situation where compiled and uncompiled code has different behavior.

An HTML5 Canvas Design Pattern

2013-06-16T00:00:00Z

I’ve been falling into a particular design pattern when using the HTML5 Canvas element. By “design pattern” I don’t mean some pretty arrangement but rather a software design pattern. This one’s a very Lisp-like pattern, and I wonder if I would have come up with it if I hadn’t first seen it in Lisp. It can also be applied to the Java 2D API, though less elegantly.

First, a review.

Drawing Basics

A canvas is just another element in the page.

 id="display" width="200" height="200">

To draw onto it, get a context and call drawing methods on it.

var ctx = document.getElementById('display').getContext('2d');
ctx.fillStyle = 'blue';
ctx.beginPath();
ctx.arc(100, 100, 75, 0, Math.PI * 3 / 2);
ctx.fill();

This will result in a canvas that looks like this,

Here’s how to do the same thing with Java 2D. Very similar, except the canvas is called a JComponent and the context is called a Graphics2D. As you could imagine from this example, Java 2D API is much richer, and more object-oriented than the Canvas API. The cast from Graphics to Graphics2D is required due to legacy.

public class Example extends JComponent {
    public void paintComponent(Graphics graphics) {
        Graphics2D g = (Graphics2D) graphics;
        g.setColor(Color.BLUE);
        g.fill(new Arc2D.Float(25, 25, 150, 150, 0, 360, Arc2D.CHORD));
    }
}

An important feature of both is the ability to globally apply transforms — translate, scale, shear, and rotate — to all drawing commands. For example, drawings on the canvas can be vertically scaled using the scale() method. Graphics2D also has a scale() method.

// ...
ctx.scale(1, 0.5);
// ...

For both JavaScript and Java the rendered image isn’t being stretched. Instead, the input vertices are being transformed before rendering to pixels. This is what makes it possible to decouple the screen coordinate system from the program’s internal coordinate system. Outside of rare performance concerns, the program’s internal logic shouldn’t be written in terms of pixels. It should rely on these transforms to convert between coordinate systems at rendering time, allowing for a moving camera.

The Transform Stack

Both cases also allow the current transform to be captured and restored. Not only does this make it easier for a function to clean up after itself and properly share the canvas with other functions, but also multiple different coordinate transforms can be stacked on top of each other. For example, the bottom transform might convert between internal coordinates and screen coordinates. When it comes time to draw a minimap, another transform can be pushed on top and the same exact drawing methods applied to the canvas.

This is where Canvas and Java 2D start to differ. Both got some aspect right and some aspect wrong, and I wish I could easily have the best of both.

In canvas, this is literally a stack, and there are a pair of methods, save() and restore() for pushing and popping the transform matrix on an internal stack. The above JavaScript example may be in a function that is called more than once, so it should restore the transform matrix before returning.

ctx.save();
ctx.scale(1, 0.5);
// ... draw ...
ctx.restore();

In Java this stack is managed manually, and it (typically) sits inside the call stack itself as a variable.

AffineTransform tf = g.getTransform();
g.scale(1, 0.5);
// ... draw ...
g.setTransform(tf);

I think Canvas’s built-in stack is more elegant than managing an extraneous variable and object. However, what’s significant about Java 2D is that we actually have access to the transform matrix. It’s that AffineTransform object. The Canvas transform matrix is an internal, inaccessible data structure. It has an established external representation, SVGMatrix, but it won’t provide a copy. If one of these is needed, a separate matrix must to be maintained in parallel. What a pain!

Why would we need the transform matrix? So that we can transform coordinates in reverse! When a user interacts with the display, the program receives screen coordinates. To be useful, these need to be converted into internal coordinates so that the program can determine where in the world the user clicked. The Java AffineTransform class has a createInverse() method for computing this inverse transform. This is something I really miss having when using Canvas. It’s such an odd omission.

The Design Pattern

So, back to the design pattern. When it comes time draw something, a transform is established on the context, something is drawn to the context, then finally the transform is removed. The word “finally” should stand out here. If we’re being careful, we should put the teardown step inside a finally block. If something goes wrong, the context will be left in a clean state. This has personally helped me in debugging.

ctx.save();
ctx.scale(1, 0.5);
try {
    // ... draw ...
} finally {
    ctx.restore();
}

In Lisp, this pattern is typically captured as a with- macro.

Perform setup
Run body
Teardown
Return the body’s return value

Instead of finally, the special form unwind-protect is used to clean up regardless of any error condition. Here’s a simplified version of Emacs’ with-temp-buffer macro, which itself is built on another with- macro, with-current-buffer.

(defmacro with-temp-buffer (&rest body)
  `(let ((temp-buffer (generate-new-buffer " *temp*")))
     (with-current-buffer temp-buffer
       (unwind-protect
           (progn ,@body)
         (kill-buffer temp-buffer)))))

The setup is to create a new buffer and switch to it. The teardown destroys the buffer, regardless of what happens in the body. An example from Common Lisp would be with-open-file.

(with-open-file (stream "/etc/passwd")
  (loop while (listen stream)
     collect (read-line stream)))

This macro ensures that the stream is closed when the body exits, no matter what. (Side note: this can be very surprising when combined with Clojure’s laziness!)

There are no macros in JavaScript, let alone Lisp’s powerful macro system, but the pattern can still be captured using closures. Replace the body with a callback.

function Transform() {
    // ...
}

// ...

Transform.prototype.withDraw = function(ctx, callback) {
    ctx.save();
    this.applyTransform(ctx);
    try {
        callback();
    } finally {
        ctx.restore();
    }
};

The callback is called once the context is in the proper state. Here’s how it would be used.

var transform = new Transform().scale(1, 0.5);  // (fluent API)

function render(step) {
    transform.withDraw(ctx, function() {
        // ... draw ...
    });
}

Since JavaScript has proper closures, that step variable is completely available to the callback. This function-as-body pattern comes up a lot (e.g. AMD), and seeing it work so well makes me think of JavaScript as a “suitable Lisp.”

Java can just barely pull off the pattern using anonymous classes, but it’s very clunky.

class Transform {
    // ...

    AffineTransform transform;

    public void withDraw(Graphics2D g, Runnable callback) {
        AffineTransform original = g.getTransform();
        g.transform(transform);
        try {
            callback.run();
        } finally {
            g.setTransform(original);
        }
    }
}

class Foo {
    // ...

    Transform transform;

    public void render(Graphics2D g, double step) {
        transform.withDraw(g, new Runnable() {
            public void run() {
                // ... draw ...
            }
        });
    }
}

Java’s anonymous classes are closures, but, unlike Lisp and JavaScript, they close over values rather than bindings. Purely in attempt to hide this complexity, Java requires that variables accessed from the anonymous class be declared as final. It’s awkward and confusing enough that I probably wouldn’t try to apply it in Java.

I think this pattern works very well with JavaScript, and if you dig around in some of my graphical JavaScript you’ll see that I’ve already put it to use. JavaScript functions work pretty well as a stand in for some kinds of Lisp macros.

Userspace Threading in JavaScript

2013-04-28T00:00:00Z

There was an interesting Daily Programmer problem posted a couple of weeks ago: write a userspace threading library. I decided to do it in JavaScript, building it on top of setTimeout. Remember that JavaScript is single-threaded by specification, so this will be a nonpreemptive, cooperative system.

Start by creating the Thread prototype. As thread constructors usually work, it accepts the function to be run in that thread.

function Thread(f) {
    this.alive = true;
    this.schedule(f);
}

The schedule method schedules a function to be run in that thread. It’s not really meant for users to use directly. I’ll define it in a moment.

Only one thread actually runs at a time, so globally keep track of the which one is running at the moment.

Thread.current = null;

Now here’s the core method that makes everything work, runner. It accepts a function of arbitrary arity and returns a function that runs the provided function in this thread.

Thread.prototype.runner = function(f) {
    var _this = this;
    return function() {
        if (_this.alive) {
            try {
                Thread.current = _this;
                f.apply(this, arguments);
            } finally {
                Thread.current = null;
            }
        }
    };
};

The runner sets the current thread to the proper value, calls the function, then clears the current thread. If the thread is no longer active, nothing happens.

With that in place, schedule is defined like this,

Thread.prototype.schedule = function(f) {
    setTimeout(this.runner(f), 0);
};

It creates a runner function for f and schedules it to run as soon as possible on JavaScript’s event loop using setTimeout. Queuing up on the event loop is the cooperative part of all this. Other threads and events may already be queued with a timeout of 0, so they run first.

Technically this is all that’s needed. To yield, schedule a function and return.

function() {
    // ... do some work ...
    Thread.current.schedule(function() {
        // ... do more work ...
    });
}

I don’t want the user to need to think about Thread.current, so here’s a convenience yield function.

Thread.yield = function(f) {
    Thread.current.schedule(f);
};

Now to use it,

function() {
    // ... do some work ...
    Thread.yield(function() {
        // ... do more work ...
    });
}

Halting a thread is easy. Any scheduled functions for this thread will not be invoked, as specified in the runner method.

Thread.prototype.destroy = function() {
    this.alive = false;
};

There’s one more situation to worry about: callbacks. Imagine an asynchronous storage API.

// ... in thread context ...
storage.getValue(function(value) {
    // doesn't run in thread context
});

In order to run in the thread the library user would need to create a runner function for the current thread. To avoid making them worry about Thread.current and runner, provide another convenience function, wrap. There may be a better name for it, but I couldn’t think of it.

Thread.wrap = function(f) {
    return Thread.current.runner(f);
};

Fixing the callback,

// ... in thread context ...
storage.getValue(Thread.wrap(function(value) {
    // ... also in thread context ...
}));

Threading Demo

To demonstrate threading I’ll make a thread that continuously fetches random numbers from a server and displays them.

Here’s a simple-httpd servlet for generating numbers. The route for this servlet will be /random.

(defservlet random text/plain ()
  (princ (random* 1.0)))

Since I’m doing this interactively with Skewer on the blank demo page, make a tag for displaying the number.

var h1 = document.createElement('h1');
document.body.appendChild(h1);

Here’s the function that will run in the thread. It fetches a number asynchronously, displays it, then recurses. Notice that Thread.yield() acts like a trampoline, providing free tail-call optimization! This is because the stack is cleared before the provided function is invoked.

function random() {
    var xhr = new XMLHttpRequest();
    xhr.open('GET', '/random', true);
    xhr.send();
    xhr.onload = Thread.wrap(function() {
        h1.innerHTML = xhr.responseText;
        Thread.yield(random);
    });
};

I set onload after calling send just for code organization purposes. That code is evaluated after send is called. As far as I know this should work fine.

Now to create a thread!

var foo = new Thread(random);

The heading flashes with random numbers as soon as the thread is created. Even though this thread is continuously running, it’s frequently yielding. Everything remains responsive, including the ability to stop the thread.

foo.destroy();

As soon as this is evaluated, the heading stops being updated. I think that’s pretty neat!

Performance

I haven’t tested performance, but I imagine it’s awful. Especially because of that frequent use of the apply method. You wouldn’t want CPU-intensive operations to cooperate like this. Fortunately, in my demo above I’m manipulating the DOM and waiting on a server response, so the performance penalties of threading should be negligible.

Fast Monte Carlo Method with JavaScript

2013-02-25T00:00:00Z

How many times should a random number from [0, 1] be drawn to have it sum over 1?

If you want to figure it out for yourself, stop reading now and come back when you’re done.

The answer is e. When I came across this question I took the lazy programmer route and, rather than work out the math, I estimated the answer using the Monte Carlo method. I used the language I always use for these scratchpad computations: Emacs Lisp. All I need to do is switch to the *scratch* buffer and start hacking. No external program needed.

The downside is that Elisp is incredibly slow. Fortunately, Elisp is so similar to Common Lisp that porting to it is almost trivial. My preferred Common Lisp implementation, SBCL, is very, very fast so it’s a huge speed upgrade with little cost, should I need it. As far as I know, SBCL is the fastest Common Lisp implementation.

Even though Elisp was fast enough to determine that the answer is probably e, I wanted to play around with it. This little test program doubles as a way to estimate the value of e, similar to estimating pi. The more trial runs I give it the more accurate my answer will get — to a point.

Here’s the Common Lisp version. (I love the loop macro, obviously.)

(defun trial ()
  (loop for count upfrom 1
     sum (random 1.0) into total
     until (> total 1)
     finally (return count)))

(defun monte-carlo (n)
  (loop repeat n
     sum (trial) into total
     finally (return (/ total 1.0 n))))

Using SBCL 1.0.57.0.debian on an Intel Core i7-2600 CPU, once everything’s warmed up this takes about 9.4 seconds with 100 million trials.

(time (monte-carlo 100000000))
Evaluation took:
  9.423 seconds of real time
  9.388587 seconds of total run time (9.380586 user, 0.008001 system)
  99.64% CPU
  31,965,834,356 processor cycles
  99,008 bytes consed
2.7185063

Since this makes for an interesting benchmark I gave it a whirl in JavaScript,

function trial() {
    var count = 0, sum = 0;
    while (sum <= 1) {
        sum += Math.random();
        count++;
    }
    return count;
}

function monteCarlo(n) {
    var total = 0;
    for (var i = 0; i < n; i++) {
        total += trial();
    }
    return total / n;
}

I ran this on Chromium 24.0.1312.68 Debian 7.0 (180326) which uses V8, currently the fastest JavaScript engine. With 100 million trials, this only took about 2.7 seconds!

monteCarlo(100000000); // ~2.7 seconds, according to Skewer
// => 2.71850356

Whoa! It beat SBCL! I was shocked. Let’s try using C as a baseline. Surely C will be the fastest.

#include 
#include 

int trial() {
    int count = 0;
    double sum = 0;
    while (sum <= 1.0) {
        sum += rand() / (double) RAND_MAX;
        count++;
    }
    return count;
}

double monteCarlo(int n) {
    int i, total = 0;
    for (i = 0; i < n; i++) {
        total += trial();
    }
    return total / (double) n;
}

int main() {
    printf("%f\n", monteCarlo(100000000));
    return 0;
}

I used the highest optimization setting on the compiler.

$ gcc -ansi -W -Wall -Wextra -O3 temp.c
$ time ./a.out
2.718359

real	0m3.782s
user	0m3.760s
sys	0m0.000s

Incredible! JavaScript was faster than C! That was completely unexpected.

The Circumstances

Both the Common Lisp and C code could probably be carefully tweaked to improve performance. In Common Lisp’s case I could attach type information and turn down safety. For C I could use more compiler flags to squeeze out a bit more performance. Then maybe they could beat JavaScript.

In contrast, as far as I can tell the JavaScript code is already as optimized as it can get. There just aren’t many knobs to tweak. Note that minifying the code will make no difference, especially since I’m not measuring the parsing time. Except for the functions themselves, the variables are all local, so they are never “looked up” at run-time. Their name length doesn’t matter. Remember, in JavaScript global variables are expensive, because they’re (generally) hash table lookups on the global object at run-time. For any decent compiler, local variables are basically precomputed memory offsets — very fast.

The function names themselves are global variables, but the V8 compiler appears to eliminate this cost (inlining?). Wrapping the entire thing in another function, turning the two original functions into local variables, makes no difference in performance.

While Common Lisp and C may be able to beat JavaScript if time is invested in optimizing them — something to be done rarely — in a casual implementation of this algorithm, JavaScript beats them both. I find this really exciting.

Web Distributed Computing Revisited

2013-01-26T00:00:00Z

Four years ago I investigated the idea of using browsers as nodes for distributed computing. I concluded that due to the platform’s constraints there were few problems that it was suited to solve. However, the situation has since changed quite a bit! In fact, this weekend I made practical use of web browsers across a number of geographically separated computers to solve a computational problem.

What changed?

Web workers came into existence, not just as a specification but as an implementation across all the major browsers. It allows for JavaScript to be run in an isolated, dedicated background thread. This eliminates the setTimeout() requirement from before, which not only caused a performance penalty but really hampered running any sort of lively interface alongside the computation. The interface and computation were competing for time on the same thread.

The worker isn’t entirely isolated; otherwise it would be useless for anything but wasting resources. As pubsub events, it can pass structured clones to and from the main thread running in the page. Other than this, it has no access to the DOM or other data on the page.

The interface is a bit unfriendly to live development, but it’s manageable. It’s invoked by passing the URL of a script to the constructor. This script is the code that runs in the dedicated thread.

var worker = new Worker('script/worker.js');

The sort of interface that would have been more convenient for live interaction would be something like what is found on most multi-threaded platforms: a thread constructor that accepts a function as an argument.

/* This doesn't work! */
var worker = new Worker(function() {
    // ...
});

I completely understand why this isn’t the case. The worker thread needs to be totally isolated and the above example is insufficient. I’m passing a closure to the constructor, which means I would be sharing bindings, and therefore data, with the worker thread. This interface could be faked using a data URI and taking advantage of the fact that most browsers return function source code from toString().

Another difficulty is libraries. Ignoring the stupid idea of passing code through the event API and evaling it, that single URL must contain *all* the source code the worker will use as one script. This means if you want to use any libraries you'll need to concatenate them with your script. That complicates things slightly, but I imagine many people will be minifying their worker JavaScript anyway.

Libraries can be loaded by the worker with the importScripts() function, so not everything needs to be packed into one script. Furthermore, workers can make HTTP requests with XMLHttpRequest, so that data don’t need to be embedded either. Note that it’s probably worth making these requests synchronously (third argument false), because blocking isn’t an issue in workers.

The other big change was the effect Google Chrome, especially its V8 JavaScript engine, had on the browser market. Browser JavaScript is probably about two orders of magnitude faster than it was when I wrote my previous post. It’s incredible what the V8 team has accomplished. If written carefully, V8 JavaScript performance can beat out most other languages.

Finally, I also now have much, much better knowledge of JavaScript than I did four years ago. I’m not fumbling around like I was before.

Applying these Changes

This weekend’s Daily Programmer challenge was to find a “key” — a permutation of the alphabet — that when applied to a small dictionary results in the maximum number of words with their letters in alphabetical order. That’s a keyspace of 26!, or 403,291,461,126,605,635,584,000,000.

When I’m developing, I use both a laptop and a desktop simultaneously, and I really wanted to put them both to work searching that huge space for good solutions. Initially I was going to accomplish this by writing my program in Clojure and running it on each machine. But what about involving my wife’s computer, too? I wasn’t going to bother her with setting up an environment to run my stuff. Writing it in JavaScript as a web application would be the way to go. To coordinate this work I’d use simple-httpd. And so it was born,

https://github.com/skeeto/key-collab

Here’s what it looks like in action. Each tab open consumes one CPU core, allowing users to control their commitment by choosing how many tabs to keep open. All of those numbers update about twice per second, so users can get a concrete idea of what’s going on. I think it’s fun to watch.

(I’m obviously a fan of blues and greens on my web pages. I don’t know why.)

I posted the server’s URL on reddit in the challenge thread, so various reddit users from around the world joined in on the computation.

Strict Mode

I had an accidental discovery with strict mode and Chrome. I’ve always figured using strict mode had an effect on the performance of code, but had no idea how much. From the beginning, I had intended to use it in my worker script. Being isolated already, there are absolutely no downsides.

However, while I was developing and experimenting I accidentally turned it off and left it off. It was left turned off for a short time in the version I distributed to the clients, so I got to see how things were going without it. When I noticed the mistake and uncommented the "use strict" line, I saw a 6-fold speed boost in Chrome. Wow! Just making those few promises to Chrome allowed it to make some massive performance optimizations.

With Chrome moving at full speed, it was able to inspect 560 keys per second on Brian’s laptop. I was getting about 300 keys per second on my own (less-capable) computers. I haven’t been able to get anything close to these speeds in any other language/platform (but I didn’t try in C yet).

Furthermore, I got a noticeable speed boost in Chrome by using proper object oriented programming, versus a loose collection of functions and ad-hoc structures. I think it’s because it made me construct my data structures consistently, allowing V8’s hidden classes to work their magic. It also probably helped the compiler predict type information. I’ll need to investigate this further.

Use strict mode whenever possible, folks!

What made this problem work?

Having web workers available was a big help. However, this problem met the original constraints fairly well.

It was low bandwidth. No special per-client instructions were required. The client only needed to report back a 26-character string.
There was no state to worry about. The original version of my script tried keys at random. The later version used a hill-climbing algorithm, so there was some state but it was only needed for a few seconds at a time. It wasn’t worth holding onto.

This project was a lot of fun so I hope I get another opportunity to do it again in the future, hopefully with a lot more nodes participating.

Parameter Lists in Common Lisp and Clojure

2013-01-20T00:00:00Z

Parameter lists in Common Lisp, called lambda lists, are written in their own mini-language, making it convenient to write functions with a flexible call interface. A lambda list can specify optional parameters, optionally with default arguments, named (keyword) parameters, and whether or not the function is variadic. It’s something I miss a lot when using other languages, especially JavaScript.

Common Lisp Parameters

Here some some examples. This function, foo, has three required parameters, a, b, and c.

(defun foo (a b c)
  ...)

To make b and c optional, place them after the symbol &optional,

(defun foo (a &optional b c)
  ...)

If second and third arguments are not provided, b and c will be bound to nil. To provide a default argument, put that parameter inside a list. Below, when a third argument is not provided, c will be bound to "bar".

(defun foo (a &optional b (c "bar"))
  ...)

To write a function that accepts any number of arguments, use &rest followed by the parameter to hold the list of the remaining arguments. Below, args will be a list of all arguments after the third. Note how this can be combined with &optional.

(defun foo (a &optional b c &rest args)
  ...)

Often, the position of a parameter may be hard to remember or read, especially if there are many parameters. It may be more convenient to name them with &key. Below, the function has three named parameters, specified at the call site using keywords — special symbols from the keyword package that always evaluate to themselves.

(defun foo (&key a b c)
  ...)

(foo :b "world" :a "hello")

Like optional parameters, when a parameter is not provided it is bound to nil. In the same way, it can be given a default argument.

(defun foo (&key (a "hello") (b "world") c)
  ...)

&key can be combined with &optional and &rest. However, the &rest argument will be filled with all of key-value pairs, so it’s generally not useful to use them together.

Lambda lists are not exclusive to defun and can be used in any place that needs to receive values in parameters, such as flet (function let), defmethod, and so on.

Clojure Parameters

Clojure forgoes these complex lambda lists in preference for overloading by arity. When a function is being defined, multiple functions of different arities can be defined at once. This makes for optional parameters. Note how this leaves no room for a default argument of nil for unspecified optional arguments.

Here, b is an optional parameter for foo, defaulting to "bar" when not provided by the caller. The first definition has an arity of one and it calls the second definition with the optional argument filled in.

(defn foo
  ([a] (foo a "bar"))
  ([a b] ...))

Variadic functions are specified with &, similar to &rest in Common Lisp. Below, xs is a sequence of all of the arguments provided after the first.

(defn foo [x & xs]
  ...)

As far as parameters are concerned, this is all Clojure has. However, Clojure’s parameter specification is actually more flexible than Common Lisp’s lambda lists in two important ways. One is that parameter position can vary with the number of provided arguments. The Clojure core functions use this a lot (ex. reduce).

The following in Common Lisp would require manually parsing the parameters on some level. The last parameter can be either second or third depending on whether a middle name was provided.

(defn make-name
  ([first last]
     (make-name first "Q" last))
  ([first middle last]
     {:first first, :middle middle, :last last}))

(make-name "John" "Public")
;; => {:first "John", :middle "Q", :last "Public"}

That covers optional parameters with default arguments and variadic functions. What about keyword parameters? Well, to cover that we need to talk about destructuring, which is another way that Clojure parameters are more powerful than lambda lists.

Destructuring

A powerful Lisp idiom is destructuring bindings. Variables can be bound to values in a structure by position in the structure. In Common Lisp there are three macros for making destructuring bindings, destructuring-bind, loop and with-slots (CLOS).

Below, in the body of the form, a, b, and c are bound to 1, 2, and 3 respectively. The form (a (b c)) is mapped into the quoted structure of the same shape to the right.

(destructuring-bind (a (b c)) '(1 (2 3))
  (+ a (* b c)))
;; => 7

Because of Common Lisp’s concept of cons cells, the cdr of a cell can be bound to a variable if that variable appears in the cdr position. This is similar to the &rest parameter (and is how Scheme does variadic functions). I like using this to match the head and tail of a list,

(destructuring-bind (x . xs) '(1 2 3 4 5)
  (list x xs))
;; => (1 (2 3 4 5))

Perhaps the neatest use of destructuring is in the loop macro. This loop walks over a list two at a time, binding a variable to each side of the pair,

(loop for (keyword value) on '(:a 1 :b 2 :c 3) by #'cddr
   collect keyword into keywords
   collect value into values
   finally (return (values keywords values)))
;; => (:A :B :C), (1 2 3)

Unfortunately destructuring in Common Lisp is limited to these few cases, or where ever else you write your own destructuring macros.

Clojure takes destructuring to its logical conclusion: destructuring can be used any place bindings are established! This includes parameter lists. It works on any core data structure, not just lists.

Below, I’m doing destructuring inside of a standard let form.

(defn greet-dr [fullname]
  (let [[first last] (clojure.string/split fullname #" +")]
    (str "Hello, Dr. " last ". "
         "It's good to see you again, " first ".")))

(greet-dr "John Doe")
;; "Hello, Dr. Doe. It's good to see you again, John."

Similarly, I could destructure an argument into my parameters. (Note the double square brackets.)

(defn greet-dr-2 [[first last]]
  ...)

(greet-dr-2 ["John" "Doe"])

Because hashmaps are a core language feature in Clojure, they can also be destructured. The syntax is a bit like flipping the hashmap inside out. The variable is specified, then the key it’s mapped to.

(let [{a :a, b :b} {:a 1 :b 2}]
  (list a b))
;; => (1 2)

When variables and keys have the same name, there’s a shorthand with :keys.

(let [{:keys [a b]} {:a 1 :b 2}]
  ...)

Variables default to nil when the corresponding key is not in the map. They can be given default values with :or.

(let [{a :a, b :b :or {a 0 b 0}} {}]
  (list a b))
;; => (0 0)

Now, here’s where it gets really neat. In Common Lisp, the &key part of a lambda list is a special case. In Clojure it comes for free as part of destructuring. Just destructure the rest argument!

(defn height-opinion [name & {height :height}]
  (if-not height
    (str "I have no opinion on " name ".")
    (if (< height 6)
      (str name " is short.")
      (str name " is tall."))))

(height-opinion "Chris" :height 6.25)
;; => "Chris is tall."

We can still access the entire rest argument at the same time, using :as, so it covers everything Common Lisp covers.

(defn foo [& {a :a, b :b :as args}]
  args)

(foo :b 10)
;; => {:b 10}

(A side note while we’re making comparisons: keywords in Clojure are not symbols, but rather a whole type of their own.)

Conclusion

Clojure parameter lists are simpler than Common Lisp’s lambda lists and, thanks to destructuring anywhere, they end up being more powerful at the same time. It’s a full super set of lambda lists, so there’s no practical trade-off.

Clojure and Emacs for Lispers

2013-01-07T00:00:00Z

According to my e-mail archives I’ve been interested in Clojure for about three and a half years now. During that period I would occasionally spend an evening trying to pick it up, only to give up after getting stuck on some installation or configuration issue. With a little bit of pushing from Brian, and the fact that this installation and configuration is now trivial, I finally broke that losing streak last week.

I’m Damn Picky

Personally, there’s a high barrier in place to learn new programming languages. It’s entirely my own fault. I’m really picky about my development environment. If I’m going to write code in a language I need Emacs to support a comfortable workflow around it. Otherwise progress feels agonizingly sluggish. If at all possible this means live interaction with the runtime (Lisp, JavaScript). If not, then I need to be able to invoke builds and run tests from within Emacs (C, Java). Basically, I want to leave the Emacs window as infrequently possible.

I also need a major mode with decent indentation support. This tends to be the hardest part to create. Automatic indentation in Emacs is considered a black magic. Fortunately, it’s unusual to come across a language that doesn’t already have a major mode written for it. It’s only happened once for me and that’s because it was a custom language for a computer languages course. To remedy this, I ended up writing my own major mode, including in-Emacs evaluation.

Unsatisfied with JDEE, I did the same for Java, growing my own extensions to support my development for the couple of years when Java was my primary programming language. The dread of having to switch back and forth between Emacs and my browser kept me away from web development for years. That changed this past October when I wrote skewer-mode to support interactive JavaScript development. JavaScript is now one of my favorite programming languages.

I’ve wasted enough time in my life configuring and installing software. I hate sinking time into doing so without capturing that work in source control, so that I never need to spend time on that particular thing again. I don’t mean the installation itself but the configuration — the difference from the defaults. (And the better the defaults, the smaller my configuration needs to be.) With my dotfiles repository and Debian, I can go from a computer with no operating system to a fully productive development environment inside of about one hour. Almost all of that time is just waiting on Debian to install all its packages. Any new language development workflow needs to be compatible with this.

Clojure Installation

Until last year sometime the standard way to interact with Clojure from Emacs was through swank-clojure with SLIME. Well, installing SLIME itself can be a pain. Quicklisp now makes this part trivial but it’s specific to Common Lisp. This is also a conflict with Common Lisp, so I’d basically need to choose one language or the other.

SLIME doesn’t have any official stable releases. On top of this, the SWANK protocol is undocumented and subject to change at any time. As a result, SWANK backends are generally tied to a very specific version of SLIME and it’s not unusual for something to break when upgrading one or the other. I know because I wrote my own SWANK backend for BrianScheme. Thanks to Quicklisp, today this isn’t an issue for Common Lisp users, but it’s not as much help for Clojure.

The good news is that swank-clojure is now depreciated. The replacement is a similar, but entirely independent, library called nREPL. (I’d link to it but there doesn’t seem to be a website.) Additionally, there’s an excellent Emacs interface to it: nrepl.el. It’s available on MELPA, so installation is trivial.

There’s also a clojure-mode package on MELPA, so install that, too.

That covers the Emacs side of things, so what about Clojure itself? The Clojure community is a fast-moving target and the Debian packages can’t quite keep up. At the time of this writing they’re too old to use nREPL. The good news is that there’s an alternative that’s just as good, if not better: Leiningen.

Leiningen is the standard Clojure build tool and dependency manager. Here, “dependencies” includes Clojure itself. If you have Leiningen you have Clojure. Installing Leiningen is as simple as placing a single shell script in your $PATH. Since I always have ~/bin in my $PATH, all I need to do is wget/curl the script there and chmod +x it. The first time it runs it pulls down all of its own dependencies automatically. Right now the biggest downside seems to be that it’s really slow to start. I think the JVM warmup time is to blame.

Let’s review. To install a working Emacs live-interaction Clojure development environment,

Install the nrepl.el package in Emacs. For me this happens automatically by the configuration in my .emacs.d repository. I only had to do this step once.
Install the clojure-mode package. Same deal.
Install a JDK. OpenJDK is probably in your system’s package manager, so this is trivial.
Put the lein shell script in the $PATH. This takes about five seconds. If even this was too much for my precious sensibilities I could put this script in my dotfiles repository.

With this all in place, do M-x nrepl-jack-in in Emacs and any clojure-mode buffer will be ready to evaluate code as expected. It’s wonderful.

Further Extending Emacs

I made some tweaks to further increase my comfort. Perhaps nREPL’s biggest annoyance is not focusing the error buffer, like all the other interactive modes. Once I’m done glancing at it I’ll dismiss it with q. This advice fixes that.

(defadvice nrepl-default-err-handler (after nrepl-focus-errors activate)
  "Focus the error buffer after errors, like Emacs normally does."
  (select-window (get-buffer-window "*nrepl-error*")))

I also like having expressions flash when I evaluate them. Both SLIME and Skewer do this. This uses slime-flash-region to do so when available.

(defadvice nrepl-eval-last-expression (after nrepl-flash-last activate)
  (if (fboundp 'slime-flash-region)
      (slime-flash-region (save-excursion (backward-sexp) (point)) (point))))

(defadvice nrepl-eval-expression-at-point (after nrepl-flash-at activate)
  (if (fboundp 'slime-flash-region)
      (apply #'slime-flash-region (nrepl-region-for-expression-at-point))))

For Lisp modes I use parenface to de-emphasize parenthesis. Reading Lisp is more about indentation than parenthesis. Clojure uses square brackets ([]) and curly braces ({}) heavily, so these now also get special highlighting. See my .emacs.d for that. Here’s what it looks like,

Learning Clojure

The next step is actually learning Clojure. I already know Common Lisp very well. It has a lot in common with Clojure so I didn’t want to start from a pure introductory text. More importantly, I needed to know upfront which of my pre-conceptions were wrong. This was an issue I had, and still have, with JavaScript. Nearly all the introductory texts for JavaScript are aimed at beginner programmers. It’s a lot of text for very little new information.

More good news! There’s a very thorough Clojure introductory guide that starts at a reasonable level of knowledge.

Clojure - Functional Programming for the JVM

A few hours going through that while experimenting in a *clojure* scratch buffer and I was already feeling pretty comfortable. With a few months of studying the API, learning the idioms, and practicing, I expect to be a fluent speaker.

I think it’s ultimately a good thing I didn’t get into Clojure a couple of years ago. That gave me time to build up — as a sort of rite of passage — needed knowledge and experience with Java, which deliberately, through the interop, plays a significant role in Clojure.

A Use For Macrolet

2012-12-06T00:00:00Z

I recently had a good use for Common Lisp’s macrolet special operator. Just as let establishes a new variable bindings and flet establishes new function bindings, macrolet establishes a new macro definitions.

For example, here’s a locally-defined anaphoric lambda macro called fn.

(macrolet ((fn (&body body) `(lambda (_) ,@body)))
  (map 'string (fn (if (standard-char-p _) _ #\*)) "naïve"))
;; => "na*ve"

My particular use case was about making my code cleaner for a brainfuck interpreter. The state of the machine was being tracked by this struct. (Interesting side note: SBCL warns about using p as a slot name because the accessor function will look like a predicate.)

(defstruct bf
  (p 0)
  (mem (make-array 30000 :initial-element 0)))

The BF instructions + and - increment the byte at the data pointer. The Common Lisp incf and decf macros can be used to do this. Similarly, the , instruction sets the byte at the data pointer, which can be done with setf. All three of these macros are place-modifying.

(defun interp (program state)
  ;; ...
  (incf (aref (bf-mem state) (bf-p state)))
  ;; ...
  (decf (aref (bf-mem state) (bf-p state)))
  ;; ...
  (setf (aref (bf-mem state) (bf-p state)) (char-code (read-char))))

That’s a whole lot of redundancy for a Lisp program. Under similar circumstances elsewhere I might use flet to reduce it.

;; This won't work.
(defun interp (program state)
  (flet ((ref () (aref (bf-mem state) (bf-p state))))
    ;; ...
    (incf (ref))
    ;; ...
    (decf (ref))))

The problem is that ref isn’t a generalized reference, which incf, decf, and setf all require. Common Lisp’s place-modifying utilities are implemented as macros. It’s known at compile-time what kind of place they are modifying: a variable, array index, object/struct slot, car, cdr, or many other things (Emacs cl package allows all sorts of things to be setfed, like (point)). The macro expands into the proper form for setting that kind of place.

The specific expansion is implementation-dependent, but, for example, setf could expand into a setq when the first argument is a symbol. New generalized references can be defined with defsetf.

In my case, a simple macro expansion can fill the role. Below, the place-modifying macro will expand ref (after looking elsewhere) to decide what to do, and ref will expand to an aref form.

(defun interp (program state)
  (macrolet ((ref () '(aref (bf-mem state) (bf-p state))))
    ;; ...
    (incf (ref))
    ;; ...
    (decf (ref))
    ;; ...
    (setf (ref) (char-code (read-char)))))

Because the macro has no parameters I could have even more easily used symbol-macrolet. I just didn’t think of it at the time.

JavaScript Strings as Arrays

2012-11-15T00:00:00Z

Lisp

One thing I enjoy about Common Lisp is its general treatment of sequences. (In fact, I wish it went further with it!) Functions that don’t depend on list-specific features generally work with any kind of sequence. For example, remove-duplicates doesn’t just work with lists, it works on any sequence.

(remove-duplicates '(a b b c))  ; list
=> (A B C)

(remove-duplicates #(a b b c))  ; array
=> #(A B C)

Functions like member and mapcar require lists because their behavior explicitly uses them. The general sequence version of these are find and map. Writing a new sequence function means sticking to these generic sequence functions, particularly elt and subseq rather than the more specialized accessors.

A string is just a one-dimensional array — a vector — with elements of the type character. This means all sequence functions also work on strings.

(make-array 10 :element-type 'character :initial-element #\a)
=> "aaaaaaaaaa"

(remove-duplicates "abbc")
=> "abc"

(map 'string #'char-upcase "foo")
=> "FOO"

(reverse "foo")
=> "oof"

There is no special set of functions just for operating on strings (except those for string-specific operations). Strings are as powerful and flexible as any other sequence. This is very convenient.

JavaScript

Unfortunately, JavaScript strings aren’t quite arrays. They look and act a little bit like arrays, but they’re missing a few of the useful methods.

var foo = "abcdef";

foo[1]
=> "b"

foo.length
=> 6

foo.reverse()  // error, no method 'reverse'

Notice that, when indexing, it returns a one-character string, not a single character. This is because there’s no character type in JavaScript. It would have been interesting if JavaScript had gone the Elisp route, where there’s no character type but instead characters are represented by integers, with some sort of character literal for using characters in code. This sort of thing can be emulated with the charCodeAt() method.

To work around the strings-are-not-arrays thing, strings can be converted to arrays with split(), manipulated as an array, and restored with join().

foo.split('').reverse().join('')
=> "fedcba"

The string method replace can act as a stand-in for map and filter. The replacement argument can be a function, which will be called on each match. If a single character at a time is selected for replacement then what’s left is the map method.

// Map over each character
foo.replace(/./g, function(c) {
    return String.fromCharCode(c.charCodeAt(0) + 10);
});
=> "klmnop"

For filter, an empty string would be returned in the case of the predicate returning false and the original match in the case of true.

foo.replace(/./g, function(c) {
    if ("xyeczd".indexOf(c) >= 0)
        return c;
    else
        return '';
});
=> "cde"

In most cases, typical use of regular expressions would serve the need for the filter() method, so this is mostly unnecessary. For example, the above could also be done like so,

foo.replace(/[^xyeczd]/g, '');

Another way to fix the missing methods would be to simply implement the Array methods for strings and add them to the String prototype, but that’s generally considered bad practice.

Elisp Recursive Descent Parser (rdp)

2012-09-20T00:00:00Z

I recently developed a recursive descent parser, named rdp, for use in Emacs Lisp programs. I’ve already used it to write a compiler.

https://github.com/skeeto/rdp

It’s available as a package on MELPA.

The Long Story

Last month Brian invited me to take a free, online programming languages course with him. You may recall that we developed a programming language together so it was only natural we would take this class.

The first part of the class is oriented around a small programming language created just for this class called ParselTongue. It looks like this:

deffun evenp(x)
    if ==(x, 0) then
        true
    else if ==(x, 1) then
            false
        else evenp(-(x, 2))
in defvar x = 14 in {
    while (evenp(x)) { x--; };   # Make sure x odd
    print("This is an odd number: ");
    print(x);
    ""; # No output
}

I’ve gotten so used to having a solid Emacs major mode when coding that I can’t stand writing code without the support of a major mode. Since this language was invented recently just for this class there was no mode for it, nor would there be unless someone stepped up to make one. I ended up taking that role. It was an opportunity to learn how to create a major mode, something I had never done before.

It’s called psl-mode.

At first it was just some syntax highlighting (very easy) and some poor automatic indentation. The indentation function would get confused by anything non-trivial. It’s actually really hard to get it right. I’ve grown a much better appreciation for automatic indentation in other modes.

In an attempt to improve this I decided I would try to fully parse the language and use the resulting parse tree to determine indentation — something like the depth of the pointer in the tree. My experience with Perl’s Parse::RecDescent some years ago was very positive and I wanted to reproduce that effect. However, rather than write the grammar in a separate language that mixes in the programming language, which I find extremely messy, instead I wanted to use pure s-expressions. A grammar looks very nice as an alist of symbols.

Arithmetic Parser

For example, here’s a grammar for simple arithmetic expressions, including operator precedence and grouping (i.e. “4 + 5 * 2.5”, “(4 + 5) * 2.5”, etc.).

(defvar arith-tokens
  '((sum       prod  [([+ -] sum)  no-sum])
    (prod      value [([* /] prod) no-prod])
    (num     . "-?[0-9]+\\(\\.[0-9]*\\)?")
    (+       . "\\+")
    (-       . "-")
    (*       . "\\*")
    (/       . "/")
    (pexpr     "(" [sum prod num pexpr] ")")
    (value   . [pexpr num])
    (no-prod . "")
    (no-sum  . "")))

Strings are regular expressions , the only thing to actually match input text (terminals). Lists are sequences, where each element in the list must match in order. Vectors (in brackets) are choices where one of the elements must match. Symbols name an expression so that it can be referred to by other expression recursively.

Give this alist to the parser and it will return an s-expression of the parse tree of the current buffer. Due to the way the grammar must be written this parse tree isn’t really pleasant to handle directly. For example, a series of multiplications (“1 * 2 * 3 * 4”) wouldn’t parse to a nice flat list but with further depth for each additional operand.

To help squash these, the parser will accept an alist of symbols and functions which process the parse tree at parse time. For example, these corresponding functions will make sure "4 * 5 * 6" gets parsed into (* 4 (* 5 (* 6 1))).

(defun arith-op (expr)
  (destructuring-bind (a (op b)) expr
    (list op a b)))

(defvar arith-funcs
  `((sum     . ,#'arith-op)
    (prod    . ,#'arith-op)
    (num     . ,#'string-to-number)
    (+       . ,#'intern)
    (-       . ,#'intern)
    (*       . ,#'intern)
    (/       . ,#'intern)
    (pexpr   . ,#'cadr)
    (value   . ,#'identity)
    (no-prod . ,(lambda (e) '(* 1)))
    (no-sum  . ,(lambda (e) '(+ 0)))))

Notice how normal Emacs functions could be supplied directly in most cases! That makes this approach so elegant in my opinion.

Also, in arith-op note the use of destructuring-bind. I’ve found that macro to be invaluable when writing these syntax tree functions.

In this case, we can be even more clever. Rather than build a nice parse tree, the expression can be evaluated directly. All it takes is one small change,

(defun arith-op (expr)
  (destructuring-bind (a (op b)) expr
    (funcall op a b)))

With this, the parser returns the computed value directly. So this evaluates to 120.

(rdp-parse-string "4 * 5 * 6" arith-tokens arith-funcs)

ParselTongue Compiler

I discovered this useful side effect while making my ParselTongue parser. The original intention was that I’d parse the buffer for use in indentation, then maybe I’d create an interpreter to evaluate the parser output. However, the resulting parse tree was looking a lot like Elisp. In an epiphany I realized I could simply emit valid Elisp directly and forgo writing the interpreter altogether. And so I accidentally created a ParselTongue compiler! This was incredibly exciting for me to realize.

This ParselTongue program,

defvar obj = {x: 1} in { obj.x }

Compiles to this Elisp,

(let ((obj (list (cons 'x 1))))
  (progn (cdr (assq 'x obj))))

Because it compiles to such a high level language, and because ParselTongue is very Lisp-like semantically, it’s a bit unconventional: the compiler emits code during parsing. In fact, when the parser backtracks, some emitted code is thrown away.

By the end of the first evening I had implemented the majority of the compiler, which quickly took precedence over indentation. The compiler is now integrated as part of psl-mode. The current buffer can be evaluated at any time with psl-eval-buffer. This function compiles the buffer and has Emacs eval the result, printing the output in the minibuffer. Compiler output can be viewed with psl-show-elisp-compilation (mostly for my own debugging).

After a few days I integrated indentation with parsing, which required modifying the parser (changes included in rdp itself). The parser needed to keep track of where the point is in the parse tree. For indentation it basically counts the depth into the parse tree, plus a few more checks for special cases.

The parser was intentionally isolated from the rest of psl-mode so that it could be separated for general use, which I have now done. It’s been a really handy general purpose tool since then. That arithmetic parser is only 35 lines of code and took about half-an-hour to create.

Future Directions

I also wrote a bencode parser — only the bencode-tokens and bencode-funcs alists are needed to parse bencode, about 30 LOC. Careful observation will reveal that I cheated and the result is a little hackish. Due to the way strings work, bencode is not context-free so it can’t be parsed purely by the grammar. I can work around it by having the parse tree function for strings consume input, since it’s called during parsing.

I’ll be using rdp to parse many more things in the future, I’m sure. It’s much more powerful than I expected.

Implemented Is Simple Data Compression

2012-09-04T00:00:00Z

Update: This post shouldn’t make sense to anyone (hopefully). Read the follow-up for an explanation.

When a branch of my posts remains simple.

This is necessary when one will assume Alan is more important than number 12. By using numbers to repeat them, but this won’t work with any sort of thing you want to load what’s needed. This includes reimplementing the reader as it seems you still need to specify any video-specific parameters, ppmtoy4m is the whole thing is just that, decorated with some tips on how the current space as visited, then recurse from the client to read a great story, I recommend you use to launch a daemon process and prints the variable information to stdout. As an added bonus, when a second variable for accumulation and a second argument is relevant.

Suppose you want to read a great story, I recommend it.

This servlet uses the Term::ProgressBar, if it’s any good, but it’s funny. As anyone with cats knows, it’s not too stupid to call fsync() to force the write to the snapshot and uninterns any new symbols. These symbols will be added to the the second experiment.

At this line, you can perform a number from a couple of these and give them back any other language that can turn out even from a large header comment in the logs, so getting someone into my honeypot wouldn’t take long at all. The only proof I could then cherry-pick/pull the issues from that repository and see the polynomial interpolation at that time, presented in order. This makes so much of web development (I think that’s his name). I am an Emacs person myself, which I use branches all the time, now that they can be written.

We will run your build system in a web front-end to it, and made a couple of seconds.

You should also be a good head start, though. The SPARC is big-endian and the results to seed their program accordingly. You could do this is by mounting the compromised filesystem in a list. In the decentralized model, everyone has their own solutions in parallel when it comes across 10 it emits 0.

Here’s an example of some of the fire gem activated and exploded, causing no blindness to me. They take a look at the same level as the printed string. You can grab my source code in response to abuse by spammers who hide fraudulent URLs behind shortened ones. If these services ever went down all at once, these shortened URLs would rot, destroying many of the image, with the FFI.

Because I wrote a shell script that will also remove the execs and live with nested shells because the zeros cancel out everything else? Here is the protocol.

Generate a 10-byte random IV. This need not implement this.

Note that the shell script, and the arcfour key scheduler at least n days.

However, generating a series of commits to all other encounters nothing changes.

Your program should simulate this by having the user to reseed somewhere. There’s no direct way to install it to dominate for awhile. It is strange that Matlab itself doesn’t have any sort of syntax highlighting. Boring! I finally ran into this image. After each paste, make a saving throw to prevent an explosion.

Because Gnohkk would also suffer from the bottom are arranged around the cats in the logs, so getting someone into my honeypot wouldn’t take long at the link in the block. Another was going to used a stationary magnet.

Our team went with this array (and replaced the current layer 5). Now, duplicate the work was done just once by freeing the entire number, it can perform both compression and decompression on both sides don’t pay attention to the development loop is just an ordered list of 50 H’s and T’s. If you implement this in the same time. This is along the way, clone my repository right into the official website so I had to do this for any long-blocking function that I use ppmtoy4m to pipe the new frames to keep, such as n^p mod M, which this will handle efficiently. For example, to add a new compression algorithm in terms of brute-force attacks it requires using numbers long enough to fit three Emacs’ windows side-by-side at 78 columns each. The leftmost one contains my active work buffer where I do most useful things, a fresh array every time it sees a free musical. Unfortunately, my writing skills are even worse. I have gotten good mileage out of a file based on their website demonstrating how to increment the iterator. I have to type a negative comment about zip archives and moved on. I am using a constant amount of memory.

It turns out that everyone is free to share his source code samples, particularly more recent entries, was that producing the relief surface was an e-mail address, I get home from work I don’t recommend doing this with secret Java applets.

There are a few weeks since I last used KOffice, so I could easily plug it into Emacs and run the test above, I would rather not do damage, but rather a patient human being. Getting tired of manually synchronizing them. It was finally time to document the effort as a single mine is destroyed, the neighboring mines will replicate a replacement. The minefield itself could therefore hold no secrets whatsoever. This leaves out any possibility of a rumor among a group of people. At any given time, each person in the background. My shell habits looked like the ones you’re seeing after end-package.

It’s really simple way to detect edges all over the weekend I came up with some rough edges. So I got it right while IE, Opera, Safari, and Chrome all do it again.

Numbers can be found inside the fake closure provided by lexical-let. In a previous post about Lua, another about a third of my name generation code.

S-expressions are handy anywhere.

Two months ago I was so happy when I run the program with the proper Perl regular expression contains quotes and these will not be worth it.

I can’t help but think that a knight moving according to the current symbol table to the existing mountain of elisp code out there, requiring a massive increase in speed when using OpenCL. In fact, there is virtually no computation involved. So what I want to look like SBCL. Fortunately, that’s not all!!! There is a fake service or computer on a chess board such that it’s somewhat easier to tell when the handler can present any contents it wants. In this case, rather than just one, even though I don’t know what it looks good, except you want to italicize a few bits smaller than a minute. All the other day I will probably be ordered by their own directory. Modern applications have moved into a directory under ~/.config/. Your script needs to be broken into small computation units, because Emacs lacked network functionality until recently was the package manager, package, and the Emacs Lisp Package Archive.

One of the info field in the list, which sounds like a .emacs file in your program. If the slot is already taken, the symbol was in an external system.

After all this, I thought I’d give it a YouTube URL and a single password if the required artifacts, digitally signs them, and bundles them up.

The demo at the same length as the variable declarations are exactly the right magical string of, say, 31 fractions.

The story is really happening. Optimizing away variables that point to it.

Oh, and I was just a tiny subset of the memory at once became a lot of memory. For example, here’s my laptop’s /bin/ls, very roughly labeled.

The different segments of the game area was a mistake on my rolls and had some wires, connected to some sort of bad things this may happen subconsciously, which is given in ImageMagick’s montage tool, which made the final montage out of the image functions described below.

You can write a lexer or tokenizer without one. Because of this tool, Samuel Stoddard, gives some in-game context to the light of day. I just use your own program, the script in your load-path somewhere.

I’ve frequently thought that a Lisp-based shell would be produced by first individually gzipping each file in first.

For a long ways away from a simple double-click shortcut. If you just want to duplicate the remaining canines. Her reward for victory was a very similar process, but without any sort of thing is transparent. I’ve already used it with a degree in, say, a few months. I’ve used POSIX threads, Pthreads, before, so it suits my needs for the first two arguments from filter2, as well as some more to see my changes, but I don’t know much about it, user AJR spoiled it with ssh-add and it queries for your passphrase, storing it in two obarrays at once, these shortened URLs would rot, destroying many of its input. For example, this is what registration looks like,

Unfortunately, the HTML output is a Harsh Mistress. If you know that the opposite way that the adventures and characters are riddled with mistakes and very unbalanced. For an easier way to set up properly in your configuration.

I strongly recommend that you generally want to have a master pad, K, that you often generate very improbable series of commits.

To all other encounters nothing changes.

And that’s it! I put this line in your program. If you are subscribed to the rescue!

Literal Arrays and Vectors in Lisp

2012-07-17T00:00:00Z

Despite being a Lisper, Unlike Brian I haven’t gotten into Clojure yet. I’ve been following along at a safe distance. Due to a recent post of his I learned about a significant difference between Clojure and other Lisps when it comes to arrays/vectors.

In this recent post, Brian wrote a ClojureScript let-like macro to hide JavaScript asynchronous function chains so that they can be used just like regular synchronous functions. Follow Clojure’s style, the asynchronous functions are written inside a vector rather than a list to indicate to the macro that they’re special.

(doasync
  [text [fetch "/foo/json"]
   url (str text ".html")
   result [fetch url]
   _ (.show view result)
   _ [timeout 1000]
   _ (.makeEditable view)])

That sounded completely reasonable to me, since array literals are rarely used inside code Common Lisp. When they are used, it’s as a global constant.

A few days later when I was talking to Brian at the metaphorical water cooler he mentioned that the macro was actually conflicting with what he would normally write. Sometimes he really did want to use a vector literal in a let binding. Why would he do that? In Common Lisp, that’s just asking for trouble — same for Elisp and Scheme.

(let ((v #(1 2 3)))
  (foo v))

The reason why this is a bad idea is that the same exact array will always be passed to foo. The array is created once at read time by the reader and re-used for the life of that code. If anyone makes a modification to the array it will damage the array for everyone using it.

(defun foo ()
  #(1 2 3))
(eq (foo) (foo))
=> T

The safer method is to create a fresh array every time by not using a literal but instead calling vector.

(let ((v (vector 1 2 3)))
  (foo v))

Clojure data structures are immutable, including vectors, so using the same exact vector in multiple places is safe. That makes use literal vectors in code less awkward. But that still left a question hanging: why was Brian using literal vectors so often that he needed one so soon after writing this macro?

In Common Lisp, they’re not very useful because the elements are not evaluated by the parser. When this vector is evaluated the result is a vector where the second element is a list containing three atoms.

#(1 (+ 2 3) 4)
=> #(1 (+ 2 3) 4)

Evaluated arrays return themselves unchanged. To do most useful things, a fresh vector needs to be constructed piecemeal. If somehow the uniqueness of a literal array wasn’t an issue, they still couldn’t be used for much.

(defun foo (x)
  #(x x x))
(foo 10)
=> #(X X X)

To achieve the desired effect, the vector function needs to be used again. Because it’s a normal function call, the arguments are evaluated.

(defun foo (x)
  (vector x x x))
(foo 10)
=> #(10 10 10)

However, to my surprise, Clojure doesn’t work like this! Literal vectors have their elements evaluated and, if necessary, are created fresh on every use — exactly like a call to vector.

(defn foo [x]
  [x x x])
(foo 10)
=> [10 10 10]
(identical? (foo 10) (foo 10))
=> false

If the exact form of the vector is needed unevaluated, it needs to be quoted just like lists.

(defn foo [x]
  '[x x x])
(foo 10)
=> [x x x]
(identical? (foo 10) (foo 10))
=> true

After further reflection, I now feel like this is the right way to go about implementing vectors. When I was first learning Lisp the non-evaluating nature of arrays really caught me by surprise. Vectors should evaluate their elements by default; if the Common Lisp behavior is needed it can always be quoted. It’s impossible to “fix” any established Lisp of course, so I’m merely wishing this was the behavior defined decades ago.

To recap: normally in Lisp, vectors evaluate to themselves, like numbers and strings. Instead, evaluation of a vector should return a new vector containing the results of each of the element evaluated. Since Clojure’s data structures are immutable, the compiler can take a shortcut when it can guarantee each of a vector’s elements always evaluate to themselves, and have the vector evaluate to itself — purely as an optimization.

Lisp Let in GNU Octave

2012-02-08T00:00:00Z

In BrianScheme, the standard Lisp binding form let isn’t a special form. That is, it’s not a hard-coded language feature, or special form. It’s built on top of lambda. In any lexically-scoped Lisp, the expression,

(let ((x 10)
      (y 20))
  (* 10 20))

Can also be written as,

((lambda (x y)
   (* x y))
 10 20)

BrianScheme’s let is just a macro that transforms into a lambda expression. This is also what made it so important to implement lambda lifting, to optimize these otherwise-expensive forms.

It’s possible to achieve a similar effect in GNU Octave (but not Matlab, due to its flawed parser design). The language permits simple lambda expressions, much like Python.

> f = @(x) x + 10;
> f(4)
ans = 14

It can be used to create a scope in a language that’s mostly devoid of scope. For example, I can avoid assigning a value to a temporary variable just because I need to use it in two places. This one-liner generates a random 3D unit vector.

(@(v) v / norm(v))(randn(1, 3))

The anonymous function is called inside the same expression where it’s created. In practice, doing this is stupid. It’s confusing and there’s really nothing to gain by being clever, doing it in one line instead of two. Most importantly, there’s no macro system that can turn this into a new language feature. However, I enjoyed using this technique to create a one-liner that generates n random unit vectors.

n = 1000;
p = (@(v) v ./ repmat(sqrt(sum(abs(v) .^ 2, 2)), 1, 3))(randn(n, 3));

Why was I doing this? I was using the Monte Carlo method to double-check my solution to this math problem:

What is the average straight line distance between two points on a sphere of radius 1?

I was also demonstrating to Gavin that simply choosing two angles is insufficient, because the points the angles select are not evenly distributed over the surface of the sphere. I generated this video, where the poles are clearly visible due to the uneven selection by two angles.

This took hours to render with gnuplot! Here are stylized versions: Dark and Light.

BrianScheme Update: Bootstrapping and Images

2011-01-30T00:00:00Z

I previously talked about BrianScheme (BS) and we've had some exciting updates since then. I've since caught up with Brian so we're committing code at about the same rate. It's really coming together to be something bigger than we both expected. We're adding what we feel is the best of Common List and Scheme. The Git repository log reveals just how much time we've spent hacking at this.

The first big milestone was full bootstrapping. BS has both an interpreter, written in C, and a compiler written in BS. The compiler targets a VM, which is written in C. The interpreter is no longer used directly; it's there simply for bootstrapping the compiler. It sets up an initial environment, then loads and runs the compiler which immediately compiles itself, and then compiles its environment, and then compiles the main user environment. Afterward the whole thing is lifted up on top of the VM and the interpreter is abandonded.

Once everything was bootstrapped and running in the VM, continuations became a practical possibility, and Brian soon added them. So now BrianScheme now has all the major components of a Scheme.

I began ramping up my contributions by adding a really solid random number generator, the Mersenne twister, and providing functions to generate numbers on all sorts of distributions: normal, Poisson, gamma, exponential, beta, and Chi-squared. It's pretty reasonable at seeding itself, too.

In the meantime, this bootstrap process, while incredibly useful, was really slowing down development. It was taking BS about 10 seconds to boot itself every time it was started. That can really kill the usefulness of the system. I started to look into ways to mitigate this, perhaps through FASLs or some kind of image dump.

After discussing it with Brian I decided to try for a memory dump, SBCL-style. My old memory pool allocator, which I thought I've never use again, really came in handy, and now has a home — modified of course — in BS. It no longer uses malloc(), instead using mmap() to get more memory, and now has the ability to free() memory, completely replacing malloc(), realloc(), and free(). So with BrianScheme no longer use the libc allocator we had complete control over the program's memory. It was just a matter of dumping the handful of big mmap() chunks to disk, and in another process, loading them back in to the same location, and finally hooking the environment back up.

Just as the SBCL documentation warns about, there are complicated issues still to be resolved (and may never be). The two big ones are alien objects and open file handles, neither of which can make it into the image. Aliens could be in there potentially, if the foreign library let us select its allocator. Brian made a change that gives the FFI a hook to rebind its symbols after load, so some of the FFI can survive the jump. The three stdin, stdout, and stderr file handles are reconnected on load, but the old, dead handles can potentially linger in places, waiting to cause errors when someone tries to read them.

It was really, really exciting to see the images come back to life for the first time. With that success the lengthy bootstrap process was no longer a big problem because it could be bypassed much of the time. Saving images is really simple to do, too.

(save-image "brianscheme.img")

This will create the image "brianscheme.img", which can be loaded again later with the -l switch. Once loaded, it will either execute a script if given one on the command line, or it will provide a new REPL. Because the image is mmapp()ed into place, loading is practically instantaneous, even if the image is dozens of megabytes (which can happen easily).

bsch -l brianscheme.img

The BS image in its booted state is about 15MB right now on 64-bit systems, and 7MB on 32-bit systems. They are low entropy and compress down to about 2MB, so it's not too bad. If you write a large program in BrianScheme and save it as an image it may be even larger.

Over the weekend I took this even further, continuing to follow in the footsteps of SBCL: I added a feature to wrap the image in the BS executable, so that the image itself is a standalone executable. To make this more useful, a toplevel function can be selected to run after the image loads, rather than a REPL. If you wrote a game in BS and wanted to compile to a standalone program,

(load "my-game.sch")
(save-image "my-game" 'executable #t 'toplevel play-my-game)

The user would execute the file my-game, which would load BS and run the function play-my-game. Because a 15MB binary is a little unwieldy to hand out, you could compress it beforehand with a tool like gzexe, which transparently compresses an executable.

The wrapper is actually very simple. It's a very slightly modified BS executable, padded out to the system's page size (4kB, generally), ending with a special marker. The image is concatenated to this binary. When run, the program scans itself looking for the marker (0xdeadbeef), and then mmap()s the portion behind it (you can only mmap() page-size offsets, which is why padding was necessary).

BrianScheme should have an interesting future ahead of it.

BrianScheme

2011-01-11T00:00:00Z

Remember back a year ago I tried my hand at a Lisp implementation called Wisp? Well, currently a co-worker of mine, Brian Taylor, is similarly working on his own Scheme implementation — but he knows more about what he's doing than I did, so it's more interesting. However, that expertise doesn't extend to inventing a clever name (Zing!): it's unsubtly called BrianScheme.

git clone git://github.com/netguy204/brianscheme.git

I've been hacking at it a little myself, cheering from the sidelines.

git remote add wellons git://github.com/skeeto/brianscheme.git

Like Wisp, it's written from scratch in C from the bottom up. Unlike Wisp, it has closures, lexical scoping, mark-and-sweep garbage collection, object system, and compiles to a bytecode (in memory). Continuations are still a ways off, but planned. One of the most powerful features so far is the foreign function interface (FFI). Now that he's implemented it with libffi he's barely had to touch the C code base. In fact, thanks to the FFI, the the C portion of BrianScheme will be shrinking.

For example, BrianScheme currently lacks floating point numbers, and its integers are currently just native fixnums. Sometime soon it will, like Wisp, use the GNU Multi-Precision Library (GMP) to provide bignums. Adding this will not require making any changes whatsoever to the C code. Using the object system (Tiny-CLOS), hooks in the reader and printer, and the FFI, this can be entirely implemented in the language itself.

Just-in-time compilation (JIT) has begun to be implemented without touching C. Again, done by pulling in in libjit with the FFI.

Because I wrote Wisp to be embeddable and a library, I was able to run Wisp in BrianScheme, via the FFI, and expose some bindings. For example, I can send it s-expressions to evaluate,

> (require 'wisp)
> (wisp:eval '(expt 6 56))
37711171281396032013366321198900157303750656

BrianScheme doesn't currently support threading, mainly because the garbage collector isn't ready for it. But remember how I mentioned GNU Pth last month? Again, I was able to load Pth with the FFI to add userspace threading, which is safe for the garbage collector because it's effectively an atomic operation. (Once continuations are implemented, this could actually be implemented without Pth, just by making good use of those continuations.) The current hangup is the REPL, which doesn't know about Pth and so it never yields. To take advantage of threading you have to suspend the REPL (with pth:join).

This REPL issue should be solved with the long term goal for BrianScheme. The C component of BrianScheme will merely exist for the purposes of bootstrapping the full system. During initialization, just about everything will be redefined in BrianScheme, with the original C definitions only living long enough to load what's needed. This includes reimplementing the reader itself in BrianScheme, which enables all sorts of possibilities, like the previously mentioned bignums implemented in the language itself, inline regular expressions, and proper yielding to the userspace thread scheduler.

So go ahead and clone Brian's repository (and add mine as a remote, too! :-D) and poke around at it. To compare to Wisp again, it's not quite as stable at the moment. It exits very easily from runtime errors, due to lacking error handling, so an instance generally doesn't live very long at the moment. This will probably be resolved sometime soon. Except for that, it does play well with Emacs as an inferior-lisp.

Emacs Set Window to 80 Columns

2010-10-06T00:00:00Z

When I'm coding, I maximize Emacs and enable winner-mode, turning my display into something much like a tiling window manager. Then I try not to leave Emacs until it's necessary. It's a really nice way to work: no mouse touching needed.

At work they gave me a nice 24" monitor, 1920 pixels across. That's just about enough to fit three Emacs' windows side-by-side at 78 columns each. The leftmost one contains my active work buffer where I do most of my typing. The center one is usually split horizontally. The top half is the *compilation* buffer and the bottom half is either Emacs calculator or an *ansi-term* buffer. The rightmost buffer contains something more static, like some sort of reference material.

However, I like my main editing window to be 80 columns wide. 78 columns cuts just too short. For awhile I was creating 80 dashes (C-u 80 -) and adjusting the window width manually to size. After doing it a few times I decided to extend Emacs to do it instead. First define a function to set the current window width.

(defun set-window-width (n)
  "Set the selected window's width."
  (adjust-window-trailing-edge (selected-window) (- n (window-width)) t))

Wrap it with an interactive function and bind it.

(defun set-80-columns ()
  "Set the selected window to 80 columns."
  (interactive)
  (set-window-width 80))

(global-set-key "\C-x~" 'set-80-columns)

For those paying extra attention: instead of writing the extra function, you could use my expose function from the other day.

(global-set-key "\C-x~" (expose (apply-partially 'set-window-width 80)))

The problem with this, though, is the dynamically generated function doesn't have a name or a docstring. Someone using describe-key would have little information to go on.

Emacs Byte Compilation

2010-07-01T00:00:00Z

A feature unique to some Lisps is the ability to compile functions individually at any time. This could be to a bytecode or native code, depending on the dialect and implementation. In a Lisp implementations where compilation matters (such as CLISP), there are typically two forms in which code can be evaluated: a slower, unoptimized uncompiled form and a fast, efficient compiled form. The uncompiled form would have some sort of advantage, even if it's merely not having to spend time on compilation.

In Emacs Lisp, the uncompiled form of a function is just a lambda s-expression. The only thing that gives it a name is the symbol it's stored in. The compiled form is a (special) vector, with the actual byte codes stored in a string as the second element. Constants, the docstring, and other things are stored in this function vector as well. The Elisp function to compile functions is byte-compile. It can be given a lambda function or a symbol. In the case of a symbol, the compiled function is installed over top of the s-expression form.

(byte-compile (lambda (x) (* 2 x)))
  => #[(x) "^H\301_\207" [x 2] 2]

The compiler will not only convert the function to bytecode and expand macros, but also perform optimizations such as removing dead code, evaluating safe constant forms, and inline functions. This provides a nice performance boost (testing using my measure-time macro),

(defun fib (n)
  "Fibonacci sequence."
  (if (<= n 2) 1
    (+ (fib (- n 1)) (fib (- n 2)))))

(measure-time
 (fib 30))
  => 1.0508708953857422

(byte-compile 'fib)

(measure-time
 (fib 30))
  => 0.4302399158477783

Most of the installed functions in a typical Emacs instance are already compiled, since they are loaded already compiled. But a number of them aren't compiled. So, I thought, why not spend a few seconds to do this?

In Common Lisp, there is a predicate for testing whether a function has been compiled or not: compiled-function-p. For whatever reason, there is no equivalent predefined in Elisp, so I wrote one,

(defun byte-compiled-p (func)
  "Return t if function is byte compiled."
  (cond
   ((symbolp   func) (byte-compiled-p (symbol-function func)))
   ((functionp func) (not (sequencep func)))
   (t nil)))

My idea was to iterate over every interned symbol and, if the function slot contains an uncompiled function, using the test above, I would call byte-compile on it. Well, it turns out that byte-compile is very flexible and will ignore symbols with no function and symbols with already compiled functions.

So next, how do we iterate over every interned symbol? There is a mapatoms function for this. Provide it a function and it calls it on every interned symbol. Well, that's simple and anticlimactic.

(mapatoms 'byte-compile)

That's it! It will take only a few seconds and spew a lot of warnings. I haven't found a way to disable those warnings, so this isn't something you'd want to have run automatically, unless you like having an extra window thrown in your face. I've only discovered this recently, so I'm not sure what sort of bad things this may do to your Emacs session. Not every function was written with compilation in mind. There are interactions with macros to consider.

I doubt there will be a noticeable performance difference. Like I said before, most everything is already compiled, and those are the functions that get used the most. There's just something nice about knowing all your functions are compiled and optimized.

The Problem with String Stored Regex

2010-04-23T00:00:00Z

While regular expressions have limited usefulness, especially in larger programs, they're still very handy to have from time to time. It's usually difficult to write a lexer or tokenizer without one. Because of this several languages build them right into the language itself, rather than tacked on as a library. It allows the regular expressions to be stored literally in the code, treated as its own type, rather than inside a string. The problem with storing a regular expression inside a string is that it can easily make an already complex regular expression much more complex. This is because there are two levels of parsing going on.

Consider this regular expression where we match an alphanumeric word inside of quotes. I'm going to use slashes to delimit the regular expression itself.

/"\w+"/

Notice there is no escaping going on. The backslash is there is indicate a special sequence \w, which is equal to [a-zA-Z0-9_]. This will get parsed and compiled into some form in memory before it is run by a program. If the language doesn't directly support regular expressions then we usually can't put it in the code as is, since the language parser won't know how to deal with it. The solution is to store it inside of a string.

However, our regular expression contains quotes and these will need to be escaped when in a quote delimited string. But I no longer need slashes to delimit my regular expression.

"\"\w+\""

Did you notice the error yet? If not, stop and think about it for a minute. Our special sequence \w will not make it intact to the regular expression compiler. That backslash will escape the w during the string parsing step, leaving only the w. The string we typed will get parsed into a series of characters in memory, performing escapes along the way, and then that sequence will be handed to the regular expression compiler. So we have to fix it,

"\"\\w+\""

That's getting hard to understand, compared to the original. Now let's throw a curve-ball into this: let's match a backslash at the beginning of the word. The normal regular expression looks like this now,

/"\\\w+"/

We have to escape our backslash to make it a literal backslash, so it takes two of them. Now, when we want to do this in a string-stored regular expression we have to escape both of those backslashes again. It looks like this,

"\"\\\\\\w+\""

Now to match a single backslash we have to insert four backslashes! Quite unfortunately, Emacs Lisp doesn't directly support regular expressions even though the language has a lot of emphasis on text parsing, so a lot of Elisp code is riddled with this sort of thing. Elisp is especially difficult because sometimes, such as during prompts, you can enter a regular expression directly and can ignore the layer of string parsing. It's a very conscious effort to remember which situation you're in at different times.

Perl, Ruby, and JavaScript have regular expressions as part of the language and it makes a lot of sense for these languages; they tend to do a lot of text parsing. Python does it partially, with its r' syntax. Any string preceded with an r loses its escape rules, but it also means you can't match both single or double quotes without falling back to a normal string with escaping. Common Lisp may be able to do it with a reader macro, but I've never seen it done.

Remember those two levels of parsing when writing string stored regex. It helps avoid hair-pulling annoying mistakes.

Scheme Live Coding

2010-04-13T00:00:00Z

Live coding (or livecoding) is software development as a performance art. A programmer's screen is viewed by the audience and a program is written and modified so that it produces sound and maybe even visual effects. The audience gets to see the code and its effects live. I'm not sure if "live" refers to the audience, the editing of live code, or maybe both. There are videos all over the web of this in action so if you haven't seen it yet do a quick search and watch one.

It's fairly easy to obtain livecoding software. For example, there's Fluxus, which extends PLT Scheme to support livecoding.

I've never done livecoding myself, but something I've noticed is that Scheme seems to be a popular choice of language for livecoding. I think I know how and why this is. Scheme, being a Lisp dialect, is naturally a living system: it can be modified and extended while it's actively running. Scheme in particular is very well suited for the task thanks to its simplicity and optimized tail recursion.

I'll do a little text-based livecoding example in PLT Scheme to show how it works. This will be easier to do yourself if your text editor can interact directly with a REPL (like Emacs or DrScheme).

Let's define a function that prints a line of text to the screen and recurses, so that it continues printing forever. The recursion is important here and I'll get back to it. To keep things manageable I'm also going to insert a 1 second pause.

(define (print-str)
  (display "Hello!\n")
  (sleep 1)
  (print-str))

If we call this function with (print-str) it will sit there printing "Hello!" over and over. It will also lock up our REPL preventing us from doing anything else. Not very useful. So let's put it in a thread instead!

(thread print-str)

Now our program is running and we get to keep our REPL. Why do we need to keep our REPL alive? Well, so we can redefine print-str on the fly! In my buffer I'll go back and change "Hello!" to "Goodbye!". While I'm doing this the function is still spitting out "Hello!".

(define (print-str)
  (display "Goodbye!\n")
  (sleep 1)
  (print-str))

As soon as I tell my editor to pass this to the REPL the print-str function gets redefined and starts printing "Goodbye!" instead. Why did the running function change? Because of recursion. When it called itself, it actually called the new definition.

Since I didn't keep a handle on the thread the easiest way to stop print-str from running is to redefine it without recursion.

(define (print-str)
  (display "Done.\n"))

And it's done. If I was really fast about this my output looks something like this.

Hello!
Hello!
Hello!
Goodbye!
Goodbye!
Done.

That's the fundamental workings of livecoding in Scheme: I changed a program while it was running. To turn the above into the more interesting livecoding you see in the videos all we need are some audio and visual bindings (which is the hard part of it all).

Emacs cat-safe

2010-03-31T00:00:00Z

I was inspired by an item in Luke's Tumblr blog last night. It was a screenshot of a program called PawSense, which monitors a computer's keyboard for cat activity. (I don't know if it's any good, but it's funny.) As anyone with cats knows, it's not unusual to leave a computer only to come back later to see garbage typed in by a wandering cat. I wrote a version for Emacs today.

git clone git://github.com/skeeto/cat-safe.git

Put it (cat-safe.el) somewhere in your load-path (like ~/.emacs.d/) and put this line in your .emacs file,

(require 'cat-safe)

This only monitors Emacs itself; it should help protect your buffers but not your web browser. When cat interference is detected Emacs switches focus to a junk buffer and lets the cat make a mess there instead. In case your cat happens to type out some Shakespeare you will be able to read it in the junk buffer. Just kill the junk buffer to return to work.

It could still use some improvement. Right now it looks for a single key being help down, excepting keys humans tend to hold down like backspace, delete, and space. If you play around with it you'll notice if you press several keys at once Emacs will sometimes create a pattern with them. I need to figure out a good way to detect this.

I'm going to run it at home for awhile to make sure it remains transparent, but still does its job. It will probably incur a performance penalty on frequently repeated keyboard macros.

Common Lisp Quick Reference

2010-02-06T00:00:00Z

I found this Common Lisp Quick Reference the other day from r/lisp, and I think it's fantastic. It's a comprehensive, libre booklet of the symbols defined by the Common Lisp ANSI standard. Very slick!

The main version is meant to be printed out and nested with a vertical fold, and it works quite well. If I ever get a chance to use Common Lisp at work (a man can dream), probably at a location without Internet access, this could come in handy. So I printed out one for myself,

Wisp Screencasts

2010-02-04T00:00:00Z

I've been chugging away on Wisp, announced in my last post, every day since I started it a few weeks ago, and it's becoming a pretty solid system. There's now an exception system, reference counting for dealing with garbage, and a reentrant parser. It's no replacement for any other lisps, but I've found it to be very fun to work on.

git clone git://github.com/skeeto/wisp.git

I wanted to show off some of the new features of Wisp, and since I was inspired by Full Disclojure, since it's so damn slick, I decided to make some screencasts of Wisp in action. All of the screencast software for GNU/Linux is pretty poor, but after a few hours of head-banging I managed to hobble something together for you. Enjoy!

Since your browser doesn't seem to support the video tag, here's a link to the video: wisp-memoize.ogv.

That video demonstrated the memoization function. It can be pulled in from the memoize library. You give it a symbol, which should have a function definition stored in it, and it will installed a wrapper around it. In the video I used the Fibonacci function from the examples library.

(require 'examples)
(fib 30) ; Slooooow ...
(memoize 'fib)
(fib 100) ; Fast!

Since your browser doesn't seem to support the video tag, here's a link to the video: wisp-detach.ogv.

This demonstrated the "detachment" feature of Wisp, which is similar to "futures" in Clojure. It forks off a new process, which executes the given function. The send function can be used in the detached process to send any lisp objects back to the parents, which can receive them with the receive function. The send function can be called any number of times to continually send data back. The receive function will block if there is no lisp object to receive yet.

(require 'examples)
(setq d (detach (lambda () (send (fib)))))
(receive d) ; Gets value from child process

Since your browser doesn't seem to support the video tag, here's a link to the video: wisp-point-free.ogv.

This video shows off the point-free functions that have been defined: function composition and partial application (I accidentally say "partial evaluation" in the video). These are actually just simple macros that any lisp could do.

Wisp Lisp

2010-01-24T00:00:00Z

Update 2010-2-04: A lot of the information below is out of date. There is an update here: Wisp Screencasts.

This is a project I've been wanting to do for some time, and I finally got around to doing it. I spent the last few days implementing my own lisp interpreter in C. Today, after sinking in about 48 hours of work, I believe I completed enough of it to consider it in a working state, with a code base stable enough that other interested people could contribute. It was really exciting to see everything come together today.

You can make a clone of the Wisp repository with Git. Go ahead; don't be shy,

git clone git://github.com/skeeto/wisp.git

To build it, all you need is a C99 compiler, make, yacc (i.e. Bison), and lex (i.e. Flex).

It doesn't use the readline library or one of it's clones to make a nice interaction command line, so it's a good idea to run it with rlwrap. That's what I've been doing. If you plan on writing Wisp code in Emacs, putting this in your .emacs will give you all the syntax niceties,

(add-to-list 'auto-mode-alist '(".wisp\\'" . lisp-mode))

You should also be able to run it as an inferior lisp and send code to it like a normal lisp. I haven't done this yet myself.

I think the name is apt, because it really is a wisp of a lisp. As of this writing, it weighs in at 1500 lines of code and is still very feature-light. I haven't actually read any material about writing lisp interpreters, so I've been winging it based on my experience with it. It's already taught me a lot of subtle things about lisp that I hadn't been aware of before.

Right know it's very simple. It doesn't yet support any syntax beyond parenthesis (no ' quoting). No closures. No garbage collection, as I'm still working out how I'm going to do that. Dynamically scoped, since that's a lot easier to do. And, like Scheme, it's a lisp-1, meaning functions and variables share a common namespace. I hope to expand it to include some features from other lisps like Arc (particularly the anonymous function syntactical sugar), Common Lisp, and Scheme.

However, it does have already anonymous functions, which many popular languages still don't have. :-) It's far enough along to let you define crazy stuff like this,

(defun example (n)
  (if (<= n 0)
      (lambda (x) (* x 10))
    (lambda (x) (* x 2.0))))

Which you can call, interactively in this case, like so,

wisp> ((example  1) 20)
40.000000
wisp> ((example -1) 20)
200

Because there's no garbage collection yet, it leaks memory like a sieve. For garbage collection, I think I'll do a mark-and-sweep, marking objects based on their reachability from the symbol table. That still leaves some corner cases — such as worrying about objects in limbo in the evaluator — that I'm not sure about. I need to be careful not to free objects still in use. Still working that one out.

It has the multiplication, addition, subtraction, and division implemented, as well as the greater-than and less-than operators. Lisp macros are implemented. A number of special forms are defined, like let, if, set, defun, defmacro, car, cdr, not, progn, lambda, and while. All of the predicates are implemented. It has a C interface, which is how all the above got defined. I already have some functions and macros defined in terms of Wisp code, too. Most of the needed functions at this point are trivial to add, though a bit tedious, so I'm mostly trying to focus on the core parts of the interpreter right now.

It's amazing how much the internal code looks like lisp written in a C dialect. I have CAR and CDR macros defined, which get used all over the place, and code frequently uses them to walk lists like lisp code would.

The core struct that everything works with the the object struct. It's defined as such,

typedef struct object
{
  type_t type;
  void *val;
} object_t;

The type_t field comes from this enumeration,

typedef enum types
  { INT, FLOAT, STRING, SYMBOL, CONS, CFUNC, SPECIAL } type_t;

The type indicates what type of data the void pointer points to, making it sort-of polymorphic. Note the CONS type, the cons cell, used to create lists,

typedef struct cons
{
  object_t *car;
  object_t *cdr;
} cons_t;

There's the familiar car and cdr pointers. There are a bunch of helper functions to manipulate and build these. For example, c_cons() creates a cons cell,

object_t *c_cons (object_t * car, object_t * cdr);

Look familiar? Yup, that's the lisp cons function. Since the nil symbol is available in C code as NIL you can chain these together in C to make a list,

object_t *lst = c_cons (c_int(10), c_cons (c_str ("hello"), NIL));

Which puts together the simple list,

(10 "hello")

Since that's so cumbersome to write out, there's a parser that can read nice lisp code and use all those same functions to make the lists. Hence, the lisp reader.

If you want to help, it's pretty easy to add more CFUNCs, C functions that are exposed to Wisp lisp code. Right now, I'd like to expose the whole C math.h library, provide a nice I/O interface, and expose a bunch of string functions. The TODO file in the repository contains more things to be done.

Wisp will probably be getting it's own "project page" here at some point in the future. When it does, I'll update this post to point to it.

Oh, and I decided to make this available under a 2-clause BSD license, so someone could easily plug it into another program as an extension language (once Wisp has matured first, of course). That would be cool.

Setting up a Common Lisp Environment

2010-01-15T00:00:00Z

Update August 2011: Things have changed again, which has always been the problem with Slime, and the reason I originally wrote this. Currently, I think the best way to install Slime is with Quicklisp using quicklisp-slime-helper.

Common Lisp is possibly the most advanced programming language. Think of pretty much any programming language feature and Common Lisp probably has it. Since lisp is the programmable programming language, when someone invents a new language feature it can probably be added to Common Lisp without even touching the language core.

However, if you're interested in digging into Common Lisp to try it out, you may find yourself quickly running into walls just getting started. It's a lot different than other programming environments you may be used to. The Common Lisp tutorials generally skip this step, assuming the user has an environment, or leaving that setup for the "vendor" to handle. So, here's a guide to setting up a great Common Lisp environment with Emacs and SLIME. It should work with any Common Lisp implementation and any operating system that can run Emacs (i.e. most of them). Even a much less capable one like Windows.

First, you need to pick a Common Lisp implementation and install it. Ideally, it should end up in your PATH. Like C, the language is defined solely by its standardized specification, rather than some canonical implementation. Steel Bank Common Lisp (SBCL) is currently the highest performing implementation, it's Free Software, and it runs on a wide variety of platforms, so take a look at that one if you're not sure.

Next, install Emacs. We're using Emacs not just because it's the best text editor ever created. :-D It's because that's what SLIME is written for, and Emacs is a lisp-aware editor. Really, Emacs is a lisp interpreter that happens to be geared towards text-editing. It's accused of breaking the rules of unix by being a single, monolithic program, but it's really a whole bunch of small lisp programs. You can even have a lisp REPL in Emacs (ielm), similar to what we will have once we're done here. It's plays very well with Common Lisp.

If you're unfamiliar with Emacs, you should stop here and familiarize yourself with it a bit. Really, you could spend a decade learning Emacs and still have more to learn. The tutorial should be good enough for now. Fire up Emacs and run the tutorial by pressing control+h then t. In Emacs notation, that's C-h t. C-h is the help/documentation prefix, which can be used to look up variables/symbols (v), functions (f), key bindings (k), info manuals (i), the current mode (m), and apropos (searching) (a). In the info manuals, you should be able to find the full Emacs manual, Elisp reference, and Elisp tutorial, since they are generally installed alongside Emacs these days. Nearly anything you might need to know can be found inside the included documentation.

Next, install SLIME. I'll be a bit more specific for this one. Make a .emacs.d directory in your home directory (whatever your HOME environmental variable is set to). This is a common place to put user-installed Emacs extensions. You will be putting your slime directory in here. There are two basic ways to obtain SLIME, as indicated right on their main page. You can do a CVS checkout of the SLIME repository, which allows you to follow it and run the latest version. Or you can grab a snapshot of the repository, which is provided, and dump it in there. Since I like you so much, I'll give you a third option. Here's a Git repository, maintained by someone very kind, that follows SLIME's CVS repository,

git clone git://git.boinkor.net/slime.git

Ultimately, you should have a directory ~/.emacs.d/slime/ that contains a bunch of SLIME source files directly inside.

Now, we tell Emacs where SLIME is and how to use it. Make a .emacs file in your home directory, if you haven't already, and put this in it,

(add-to-list 'load-path "~/.emacs.d/slime/")
(require 'slime)
(slime-setup '(slime-repl))

Once it's saved, either restart Emacs, or simply evaluate those lines by putting the cursor after each them in turn and typing C-x C-e. If you did everything right so far, you shouldn't have any errors. (If you did, go back up and see what you did wrong.) If your Common Lisp installation didn't end up in your PATH as "lisp" (not uncommon) for some reason, you may need to tell Emacs where it is. For example, I can point directly to my SBCL installation with this line,

(setq inferior-lisp-program "/usr/bin/sbcl")

If everything is set up right, fire up SLIME with "M-x slime". It should compile the back-end, called swank, and run a Common Lisp REPL as an inferior process to Emacs. You should end up with a nice prompt like this,

CL-USER>

At this line, you can start evaluating lisp expressions as you please. But this isn't where the true power of SLIME comes in yet. I'll give you an example: make a new file with a .lisp extension and open it. Throw some lisp in there,

(defun adder (x)
  (lambda (y) (+ x y)))

Type C-x C-k and it will send the current buffer over to be compiled and loaded. This code here uses a closure, so you know you aren't accidentally using Emacs lisp, as it doesn't have closures. At the REPL you can call it,

CL-USER> (funcall (adder 5) 6)

Which will print the return value, 11. That's all there is to it. You write code in the buffer, then with a simple keystroke send it to the Common Lisp system to be evaluated and loaded. Because the SLIME key bindings eclipse the Emacs lisp key bindings, you can type this same line in the lisp source buffer place the cursor at the end, and type C-x C-e, which will send it out to Common Lisp to be evaluated. Look at the mode help (C-h m) to see all the key bindings made available.

This is a great programming environment that makes Common Lisp all the more fun to use. You run a single, continuous instance if your program growing it gradually. (This is exactly how I built my Emacs web server with elisp.) You can test your code as soon as soon as it's written.

The setup can get even more advanced. The Common Lisp REPL need not be running on the same computer. It can be running on another computer, as long as SLIME is able to connect to it over the network. Several developers could even share a single Common Lisp process running on a common machine. Lots of possibilities.

If you don't have a Common Lisp book yet, there's Practical Common Lisp, which you can read at no cost online or download for reading offline. It's based on an Emacs and SLIME setup, so you'll be right on track.

Tweaking Emacs for Ant and Java

2009-12-06T00:00:00Z

Update: This is now part of my java-mode-plus Emacs extension.

Developing C in Emacs is a real joy, and it's mostly thanks to the compile command. Once you have your Makefile — or SConstruct or whatever build system you like — setup and you want to compile your latest changes, just run M-x compile, which will run your build system in a buffer. You can then step through the errors and warnings with C-x `, and Emacs will take you to them. It's a very nice way to write code.

I use the compile command so much that I bound it to C-x C-k (C-k tends to be part of compile key bindings),

(global-set-key "\C-x\C-k" 'compile)

Until recently, I didn't have as nice of a setup for Java. Since they generally force offensive IDEs onto me at work this wasn't something I needed yet anyway, but I get to choose my environment on a new project this time. If you're using Makefiles for some reason when building your Java project, it still works out fairly well because they're usually called recursively. It gets more complicated with Ant, where there is only one top-level build file. Emacs' compile command only runs the build command in the buffer's current directory.

I know three solutions to this problem. One is to provide the build file's absolute path when compile asks for the command with the -buildfile (-f) option. You only need to type it once per Emacs session, so that's not too bad.

ant -emacs -buildfile /path/to/build.xml

It's not well documented, but there is a -find option that can be given to Ant that will cause it to search for the build file itself. This is even nicer than the previous solution. Just remember to place it last, unless you give it the build filename too. For example, if you wanted to run the clean target,

ant -emacs clean -find

To keep the actual call as simple as possible, I wrote a wrapper for compile, and put a hook in java-mode to change the local binding. The wrapper, ant-compile, searches for the build file the same way -find would do.

(defun ant-compile ()
  "Traveling up the path, find build.xml file and run compile."
  (interactive)
  (with-temp-buffer
    (while (and (not (file-exists-p "build.xml"))
                (not (equal "/" default-directory)))
      (cd ".."))
    (call-interactively 'compile)))

So I can transparently keep using my muscle memory compile binding, I set up the key binding in a hook,

(add-hook 'java-mode-hook
          (lambda () (local-set-key "\C-x\C-k" 'ant-compile)))

Voila! Java works looks a little bit more like C.

Lisp Fantasy Name Generator

2009-07-03T00:00:00Z

Earlier this year I implemented the RinkWorks fantasy name generator in Perl. I think lisp lends itself even better for that, and so I have a partial elisp implementation for you.

What stands out for me is that the patterns can easily be represented as a S-expression. We represent substitutions with symbols, literals with strings, and groups with lists. For example, this pattern,

s(ith|<'C>)V

can be represented in code as,

(s ("ith" ("'" C)) V)

I want a function I can apply to this to generate a name. First, I set up an association list with symbols and its replacements,

(defvar namegen-subs
  '((s ach ack ad age ald ale an ang ar ard as ash at ath augh
       aw ban bel bur cer cha che dan dar del den dra dyn
       ech eld elm em en end eng enth er ess est et gar gha
       hat hin hon ia ight ild im ina ine ing ir is iss it
       kal kel kim kin ler lor lye mor mos nal ny nys old om
       on or orm os ough per pol qua que rad rak ran ray ril
       ris rod roth ryn sam say ser shy skel sul tai tan tas
       ther tia tin ton tor tur um und unt urn usk ust ver
       ves vor war wor yer)
    (v a e i o u y)
    ...
    (d elch idiot ob og ok olph olt omph ong onk oo oob oof oog
       ook ooz org ork orm oron ub uck ug ulf ult um umb ump umph
       un unb ung unk unph unt uzz))
  "Substitutions for the name generator.")

Since we will need this in a couple places, make a function to randomly select an element from a list,

(defun randth (lst)
  "Select random element from the given list."
  (nth (random (length lst)) lst))

A function for replacing a symbol,

(defun namegen-select (sym)
  "Select a replacement for the given symbol."
  (if (null (assoc sym namegen-subs))
      (throw 'bad-symbol
             (concat "Invalid substitution symbol: " (format "%s" sym)))
    (symbol-name (randth (cdr (assoc sym namegen-subs))))))

And finally, the generator. Find a string, pass it through, find a symbol, substitute it, find a list, pick one element and recurse on it.

(defun namegen (sexp)
  "Generate a name from the given sexp generator."
  (cond
   ((null sexp) "")
   ((stringp sexp) sexp)
   ((symbolp sexp) (namegen-select sexp))
   ((listp sexp)
    (concat (if (listp (car sexp)) (namegen (randth (car sexp)))
              (namegen (car sexp)))
            (namegen (cdr sexp))))))

That's it! We can apply it to the expression above,

(namegen '(s ("ith" ("'" C)) V))
-> "rynithi"

But that's really the easy part. The hard part would be converting the original pattern into the S-expression, which I don't plan on doing right now.

Something else to note: this is thousands of times faster than the Perl version I wrote earlier.

I threw the code in with the rest of my name generation code (namegen.el),

git clone git://github.com/skeeto/fantasyname.git

S-expressions are handy anywhere.

United States Hamiltonian Paths

2009-06-21T00:00:00Z

Awhile ago I wanted to find every Hamiltonian path in the contiguous 48 states. That is, trips that visit each state exactly once. Writing a program to search for Hamiltonian paths is easy (I did this already). The most time consuming part was actually putting together the data that specified the graph to be searched. I hope someone somewhere finds it useful. Here is a map for reference,

It took me several passes before I stopped finding errors. I think I have it all right now, but there could still be some mistakes. If you see one, leave a comment and I'll fix it here. Here is the graph as an S-expression alist; the car (first) element in each list is a state, and the cdr (rest) is the unordered list of states that can be reached from it.

((me nh)
 (nh vt ma me)
 (vt ny ma nh)
 (ma ri ct ny nh vt)
 (ny pa nj ma ct vt)
 (ri ma ct)
 (ct ri ma ny)
 (nj pa ny de)
 (de md pa nj)
 (pa nj ny de md wv oh)
 (md pa de va wv)
 (va md wv ky tn nc)
 (nc va tn ga sc)
 (sc nc ga)
 (ga fl sc al nc tn)
 (al ms fl ga tn)
 (ms la ar tn al)
 (tn ms al ga nc va ky mo ar)
 (ky wv va tn mo il in oh)
 (wv md pa oh ky va)
 (oh pa wv ky in mi)
 (fl al ga)
 (mi wi oh in)
 (wi mn ia il mi)
 (il in ky mo ia wi)
 (in oh ky il mi)
 (mo il ky tn ar ok ks ne ia)
 (ar mo tn ms la tx ok)
 (la ms ar tx)
 (tx ok nm ar la)
 (ok ks mo ar tx nm co)
 (ks ok co ne mo)
 (ne sd ia mo ks co wy)
 (sd nd mn ia ne wy mt)
 (nd mt sd mn)
 (ia ne mo il wi mn sd)
 (mn wi ia sd nd)
 (mt id wy sd nd)
 (wy id ut co ne sd mt)
 (co ne ks ok nm ut wy)
 (nm co ok tx az)
 (az nm ut ca nv)
 (ut nv id wy co az)
 (id mt wy ut nv or wa)
 (wa or id)
 (or wa id nv ca)
 (nv or id ut az ca)
 (ca az nv or))

Note that all paths must start or end in Maine because it connects to only one other state.

Elisp Wishlist

2009-05-29T00:00:00Z

Update: It looks like all these wishes, except the last one, may actually be coming true! Guile can run Elisp better than Emacs! The idea is that the Elisp engine is replaced with Guile — the GNU project's Scheme implementation designed to be used as an extension language — and written in Scheme is an Elisp compiler that targets Guile's VM. The extension language of Emacs then becomes Scheme, but Emacs is still able to run all the old Elisp code. At the same time Elisp itself, which I'm sure many people will continue to use, gets an upgrade of arbitrary precision, closures, and better performance.

I've been using elisp a lot lately, but unfortunately it's missing a lot of features that one would find in a more standard lisp. The following are some features I wish elisp had. Many of these could be fit into a generic "be more like Scheme or Common Lisp". Some of these features would break the existing mountain of elisp code out there, requiring a massive rewrite, which is likely the main reason they are being held back.

Closures, and maybe continuations. Closures are one of the features I miss the most when writing elisp. They would allow the implementation of Scheme-style lazy evaluation with delay and force, among other neat tools. Continuations would just be a neat thing to have, though they come with a performance penalty.

Closures would also pretty much require Emacs switch to lexical scoping.

Arbitrary precision. Really, any higher order language's numbers should be bignums. Emacs 22 does come with the Calc package which provides arbitrary precision via defmath. Perl does something like this with the bignum module.

Packages/namespaces. Without namespaces all of the Emacs packages prefix their functions and variables with its name (i.e. dired-). Some real namespaces would be useful for large projects.

C interface. This is something GNU Emacs will never have because Richard Stallman considers Emacs shared libraries support to be a GPL threat. If Emacs could be dynamically extended some useful libraries could be linked in and exposed to elisp.

Concurrency. If some elisp is being executed Emacs will lock up. This is a particular problem for Gnus. Again, Emacs would really need to switch to lexical scoping before this could happen. Threading would be nice.

Speed. Emacs lisp is pretty slow, even when compiled. Lexical scoping would help with performance (compile time vs. run time binding).

Regex type. I mention this last because I think this would be really cool, and I am not aware of any other lisps that do it. Emacs does regular expressions with strings, which is silly and cumbersome. Backslashes need extra escaping, for example. Instead, I would rather have a regex type like Perl and Javascript have. So instead of,

(string-match "\\w[0-9]+" "foo525")

we have,

(string-match /\w[0-9]+/ "foo525")

Naturally there would be a regexpp predicate for checking its type. There could also be a function for compiling a regexp from a string into a regexp object. As a bonus, I would also like to use it directly as a function,

(/\w[0-9]+/ "foo525")

I think a regexp price would really give elisp an edge, and would be entirely appropriate for a text editor. It could also be done without breaking anything (keep string-style regexp support).

There is more commentary over at EmacsWiki: Why Does Elisp Suck.

The Lazy Fibonacci List

2009-04-10T00:00:00Z

In a project I am working on, I want to implement a large list using lazy evaluation in Scheme. The list is large enough to be too unwieldy to store entirely in memory, but I still want to represent it in my program as if it was. The solution is lazy evaluation.

One use of lazy evaluation is allowing a program to have infinitely sized data structures without going into the impossible task of actually creating them. Instead, the structure is created on the fly as needed. As a prototype for getting it right, I made an infinitely long list in Scheme that contains the entire Fibonacci series.

This function, given two numbers from the series, returns the lazy list. It uses delay to delay evaluation of the list.

(define (fib f)
  (cons (cadr f)
        (delay (fib (list (cadr f)
                          (apply + f))))))

Notice the recursion here as no base case, so without lazy evaluation it would continue along forever without halting. Now run it,

> (fib '(0 1))
(1 . #)

The rest of the list is stored as a promise, which will later be teased out using force. This forces evaluation of the promise. Here is a function to traverse the list to the nth element and return it. Notice, this does have a base case.

(define (nth-fib f n)
  (if (= n 1) (car f)
      (nth-fib (force (cdr f)) (- n 1))))

Here it is in action. It is retrieving the 30th element.

> (define f (fib '(0 1)))
> f
(1 . #)
> (nth-fib f 30)
832040

If you examine f, it contains the first 30 numbers until running into an unevaluated promise. This behavior is very similar to memoization, as calculated values are stored instead of being recalculated later.

These two functions are also behaving as coroutines. When nth-fib reaches a promise, it yields to fib, which continues its non-halting definition. After producing a new value in f, it yields back to nth-fib.

The way I called these functions above, however, can lead to problems. We are storing all the calculated values in f, which can take up a lot of memory. For example, this probably won't work,

> (nth-fib f 1000000)

We will run out of memory before it halts. Instead, we can do this,

> (nth-fib (fib '(0 1)) 1000000)

Because nth-fib uses tail recursion as it traverses the list, unneeded calculated values are tossed (which the garbage collector will handle) and no additional function stack is used. All Scheme implementations optimize tail recursion in this way. This will continue along until it hits the millionth Fibonacci number, all while using a constant amount of memory.

It turns out that Scheme calls this type of data structure a stream, and some implementations have functions and macros defined so that they are ready to use.

So there you go: memoization, lazy evaluation, and coroutines all packed into one example.

Lisp Number Representations

2008-03-15T00:00:00Z

This exercise partly comes from a couple different chapters in the book The Little Schemer. The book is an introduction to the Scheme programming language, a dialect of Lisp. The purpose to to teach basic programming concepts in a way that anyone can follow along just as well as someone with a degree in, say, computer science. It is still very useful for us programmer types because there are some good practice you get from reading and playing along.

First of all, Lisp is famous (infamous?) for lacking syntax. Any Lisp program is simply an S-expression, put simply, a list of lists. There is no operator precedence because operators are treated just like functions. This leads to prefix notation for mathematical expressions,

(+ 4 5)
=> 9

where the => indicates the result of evaluating the expression. We can apply as many operands as we want,

(+ 2 3 4 5 10)
=> 24

We can put another list right in there as an operand,

(+ 3 (* 2 5) 4)
=> 17

You get the idea. In a function, the value of the last expression is the return value. For example, here is the square function in Scheme, which squares its input,

(define (square x)
  (* x x))

Then we can use it,

(+ (square 2) (square 5))
=> 29

There are three important list operators to understand as well: car, cdr, and cons. car returns the first element in a list. In the example below, the ', a single quote, tells the interpreter or compiler that the list is to be treated as data and not to be executed. This is shorthand, or syntactic sugar, for the quote operator: (quote (stallman moglen)) is the same as '(stallman moglen).

(car '(stallman moglen lessig))
=> stallman

cdr returns the "rest" of a list (everything but the car of the list). When passing a list with only one element cdr returns the empty list: ().

(cdr '(stallman moglen lessig))
=> (moglen lessig)
(cdr '(stallman))
=> ()

We can ask if a list is empty or not with null?. #t and #f are true and false.

(null? '(stallman moglen lessig))
=> #f
(null? '())
=> #t

And finally, for lists, we have cons. This function allows us to build a list. It glues the first argument to the front of the list in the second argument,

(cons 'stallman '(moglen lessig))
=> (stallman moglen lessig)
(cons 'stallman '())
=> (stallman)

And one last function you need to know: eq?. It determines the two atoms are the same atom,

(eq? 'stallman 'moglen)
=> #f
(eq? 'stallman 'stallman)
=> #t

Now, for this exercise we will pretend that the basic arithmetic functions have not been defined for us. Instead all we have is add1 and sub1, each of which adds or subtracts 1 from its argument respectively.

(add1 5)
=> 6
(sub1 5)
=> 4

Oh, I almost forgot. We also have the zero? function defined for us, which tells us if its argument is 0 or not. Notice that functions that return true or false, called predicates, have a ? on the end.

(zero? 2)
=> #f
(zero? 0)
=> #t

To make things simple, these definitions will only consider positive numbers. We can define the + function (for only two arguments) in terms of the three basic functions shown above. It might be interesting to try to write this yourself before you look any further. (Hint: define it recursively!)

;; Adds together n and m
(define (+ n m)
  (if (zero? m) n
      (add1 (+ n (sub1 m)))))

If the second argument is 0 we are done and simply return the first argument. If not, we add 1 to n + (m - 1). The - function is defined similarly.

;; Subtracts m from n
(define (- n m)
  (if (zero? m) n
      (sub1 (- n (sub1 m)))))

Multiplication is the act of performing addition many times. We can go on defining it in terms of addition,

(define (* n m)
  (if (zero? m) 0
      (+ n (* n (sub1 m)))))

(We'll leave division as an exercise for the reader as it gets a little more complicated than I need to go in order to get my overall point across.)

We will leave math behind for a moment take a look at The Roots of Lisp. In that link is an excellent paper written by Paul Graham about John McCarthy, the inventor (or perhaps discoverer?) of Lisp, and how Lisp came to be. It turns out that in order to have a fully functional Lisp engine we only need seven primitive operators: operators defined outside of the language itself as building blocks for the language. For Lisp these seven operators are (Scheme-ized for our purposes): eq?, atom?, car, cdr, cons, quote, and if.

Notice how none of these are math operators. You may wonder how we can possibly perform mathematical operations when we lack these facilities. The answer: we have to define our own representation for numbers! Let's try this, define a number as a list of empty lists. So, the number 3 is,

'(() () ())

And here is 0, 2, and 4,

'()
'(() ())
'(() () () ())

See how that works? Before, when we wanted to define addition and subtraction, we needed three other functions: zero?, add1, and sub1. With our number representation, how could we define add1 with our seven primitive operators? Our numbers are defined as lists, so we can use our list operators. To add 1 to a number, we append another empty list. Hey, that sounds a lot like cons!

(define (add1 n)
  (cons '() n))

Subtraction is removing an element from the list, which sounds a lot like cdr,

(define (sub1 n)
  (cdr n))

And to define zero? we need to check for an empty list. Notice this will also be the definition for null?.

(define (zero? n)
  (eq? '() n))

And now we are back where we started. In fact, you can use the exact definitions above to define +, -, and *. Our entire method number representation depends on how we define add1, sub1, and zero?. Let's try it out,

;; 3 + 4
(+ '(() () ()) '(() () () ()))
=> (() () () () () () ())

;; 5 - 2
(- '(() () () () ()) '(() ()))
=> (() () ())

;; 2 * 2
(* '(() ()) '(() ()))
=> (() () () ())

;; 3 + 4 * 2   bolded for clarity
(+ (* '(() () () ()) '(() ())) '(() () ()))
=> (() () () () () () () () () () ())

Pretty cool, huh? We just added arithmetic (albeit extremely simple) to our basic Lisp engine. With some modifications we should be able to define and operate on negative integers and even define any rational number (limited by how much memory your computer's hardware can provide).

Now, thank goodness this isn't how real Lisp implementations actually handle numbers. It would be incredibly slow and impractical, not to mention annoying to read. Normally, numbers and math operators are primitive so that they are fast.

Iterated Prisoner's Dilemma

2007-11-06T00:00:00Z

I was reading about the prisoner’s dilemma game the other day and was inspired to simulate it myself. It would also be a good project to start learning Common Lisp. All of the source code is available in its original source file here:

/download/prison/prison.lisp

I have only tried this code in my favorite Common Lisp implementation, CLISP, as well CMUCL.

In prisoner’s dilemma, two players acting as prisoners are given the option of cooperating with or betraying (defecting) the other player. Each player’s decision along with his opponents decision determines the length of his prison sentence. It is bad news for the cooperating player when the other player is defecting.

Prisoner’s dilemma becomes more interesting in the iterated version of the game, where the same two players play repeatedly. This allows players to “punish” each other for uncooperative play. Scoring generally works as so (higher is better),

		Player A
		coop	defect
Player B	coop	(3,3)	(0,5)
Player B	defect	(5,0)	(1,1)

The most famous, and strongest individual strategy, is tit-for-tat. This player begins by playing cooperatively, then does whatever the its opponent did last. Here is the Common Lisp code to run a tit-for-tat strategy,

(defun tit-for-tat ()
  (lambda (x)
    (if (null x) :coop x)))

If you are unfamiliar with Common Lisp, the lambda part is returning an anonymous function that actually plays the tit-for-tat strategy. The tit-for-tat function generates a tit-for-tat player along with its own closure. The argument to the anonymous function supplies the opponent’s last move, which is one of the symbols :coop or :defect. In the case of the first move, nil is passed. These are some really simple strategies that ignore their arguments,

(defun rand-play ()
  (lambda (x)
    (declare (ignore x))
    (if (> (random 2) 0) :coop :defect)))

(defun switcher-coop ()
  (let ((last :coop))
    (lambda (x)
      (declare (ignore x))
      (if (eq last :coop)
          (setf last :defect)
          (setf last :coop)))))

(defun switcher-defect ()
  (let ((last :defect))
    (lambda (x)
      (declare (ignore x))
      (if (eq last :coop)
          (setf last :defect)
          (setf last :coop)))))

(defun always-coop ()
  (lambda (x)
    (declare (ignore x))
    :coop))

(defun always-defect ()
  (lambda (x)
    (declare (ignore x))
    :defect))

Patrick Grim did an interesting study about ten years ago on iterated prisoner’s dilemma involving competing strategies in a 2-dimensional area: Undecidability in the Spatialized Prisoner’s Dilemma: Some Philosophical Implications. It is very interesting, but I really wanted to play around with some different configurations myself. So what I did was extend my iterated prisoner’s dilemma engine above to run over a 2-dimensional grid.

Grim’s idea was this: place different strategies in a 2-dimensional grid. Each strategy competes against its immediate neighbors. (The paper doesn’t specify which kind of neighbor, 4-connected or 8-connected, so I went with 4-connected.) The sum of these competitions are added up to make that cell’s final score. After scoring, each cell takes on the strategy of its highest neighbor, if any of its neighbors have a higher score than itself. Repeat.

The paper showed some interesting results, where the tit-for-tat strategy would sometimes dominate, and, in other cases, be quickly wiped out, depending on starting conditions. Here was my first real test of my simulation. Three strategies were placed randomly in a 50x50 grid: tit-for-tat, always-cooperate, and always-defect. This is the first twenty iterations. It stabilizes after 16 iterations.

(run-random-matrix 50 100 20 '(tit-for-tat always-coop always-defect))

White is always-cooperate, black is always-defect, and cyan is tit-for-tat. Notice how the always-defect quickly exploits the always-cooperate and dominates the first few iterations. However, as the always-cooperate resource becomes exhausted, the tit-for-tat cooperative strategy works together with itself, as well as the remaining always-cooperate, to eliminate the always-defect invaders, who have no one left to exploit. In the end, a few always-defect cells are left in equilibrium, feeding off of always-cooperate neighbors, who themselves have enough cooperating neighbors to hold their ground.

The effect can be seen more easily here. Around the outside is tit-for-tat, in the middle is always-cooperate, and a single always-defect cell is placed in the middle.

(run-matrix (create-three-box) 100 30)

The asymmetric pattern is due to the way that ties are broken.

The lisp code only spits out text, which isn’t very easy to follow whats going on. To generate these gifs, I first used this Octave script to convert the text into images. Just dump the lisp output to a text file and remove the hash table dump at the end. Then run this script on that file:

/download/prison/pd_plot.m

The text file input should look like this:

/download/prison/example.txt

~~You will need Octave-Forge.~~

The script will make PNGs. You can either change the script to make GIFs (didn’t try this myself), or use something like ImageMagick to convert the images afterward. Then, you compile frames into the animated GIF using Gifsicle.

See if you can come up with some different strategies and make some special patterns for them. You may be able to observe some interesting interactions. The image at the beginning of the article uses all of the listed strategies in a random matrix.

I will continue to try out some more to see if I can find something particularly interesting.