null program

Middleman Parallelization

I recently discovered a very clever tool called Middleman. It's a quick way to set up and manage multiple-process workload queue. The process output and display is done inside of a screen session, so if it's going to take awhile you can just detach and check on it again later. In the past I used make's -j option to do this, but that's always a pain to set up.

It is composed of three programs: mdm.screen, mdm-run, and mdm-sync. The first is the top level supervisor that you use to launch the enhanced shell script. The second prefixes every command to be run in parallel. The third is prefixes the final command that depends on all of the individual processes.

The linked Middleman page has a good example, but I'll share my own anyway. I used it over the weekend to download a long series of videos with youtube-dl. Because the transfer rate for a single video is throttled I wanted to grab several at a time, but I also didn't want to grab them all at the same time. Here's the dumb version of the script, download.sh, that does them all at once.

#!/bin.sh
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8 &

With Middleman all I had to do was this,

#!/bin/sh
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8

Then just launch the script with mdm.screen. It defaults to 6 processes at a time, but you can adjust it to whatever you want with the -n switch. I used 4.

$ mdm.screen -n 4 ./download.sh

There is a screen window that lists the process queue and highlights the currently active jobs. I could switch between screen windows to see the output from individual processes and see how they were doing.

From the perspective of the shell script, the first four commands finish instantly but fifth command will block. As soon as Middleman sees one of the first four processes complete the fifth one will begin work, returning control to the shell script, and the sixth command will block, since the queue is full again.

I'm sure I'll be using this more in the future, especially for tasks like batch audio and video encoding. I bet this could be useful on a cluster.


Java is Death By A Thousand Paper Cuts

At least it is for me. This past week at work I've been furiously rushing work on a project written in Java. It's completely my fault that I'm using Java since I'm the one who picked the language. I wanted to use Java the platform, but I don't know any other JVM languages well enough to use them in place of Java the language for a project at work.

It's all sorts of little things that make writing Java so exhausting for me. At the end of the day it I just feel irritated. I hate the absolute lack of functional programming and that I have to specify everything at such a low level. The whole reason we program in high-level languages is so we can express algorithms more concisely, but Java fails at this.

Here's an example of what I'm talking about, something I basically did a few times today. Let's say you have an array of floats, nums and you want to sum them and return the result (or maybe use it in another expression). In Lisp it's very straightforward.

(reduce '+ nums)

"Reduce the sequence nums by addition." Notice that it's more about saying what I want to do rather than how to do it. I don't have to introduce any temporary storage or iterators. To do the same thing in Java it will look something like this.

int sum = 0;
for (double num : nums) {
    sum += num;
}
return sum;

If you're using an older Java without the enhanced looping construct it gets even uglier. I had to introduce a variable for accumulation and a second variable for iteration. This sort of thing has to be done all over the place in Java, and it greatly increases the cognitive overload when reading Java code.

This instruction is more about telling the computer the how rather than my overall intention. One problem with telling it the how is that I've unnecessarily locked in a specific algorithm and ordering. The literal instruction says that the numbers must be added in order, sequentially. My Lisp instruction doesn't do that.

It gets even worse when you complicate it slightly by adding a second array and, say, multiplying it pairwise with the first.

for (int i = 0; i < nums1.length; i++) {
    sum += nums1[i] * nums2[i];
}
return sum;

Now the loop gets more complex. I have to tell it how to increment the iterator. I have to tell it to check the bounds of the array. The iterator is a misdirection because the actual number stored in it isn't what's important. Again, the Lisp method is much more concise.

(reduce '+ (map 'list '* nums1 nums2))

"Map the two sequences by multiplication into a list, then reduce it by addition." Unfortunately we start to leak a little bit into the how here. I am telling it that the intermediate structure should be a list, because map forces me to pick a representation. Besides that, I am only describing my overall intention and not the obvious details.

So with Java my days become filled with the tedious low-level algorithm descriptions that I have to hammer out over and over and over. Death by a thousand paper cuts.

Lisp isn't the only language that has a (generally) much better approach; it's just my favorite. :-) Most languages with at least some decent functional facilities will also do the above concisely.


Distributed Computing with Emacs

I got an Elisp idea today and even went as far as implementing a proof of concept for it: distributed computing with Emacs Lisp. As usual for me the idea takes advantage of Lisp features to make the task pretty simple, very specifically Elisp's implementation. In this case it's the Lisp reader, printer, and the fact that Elisp functions have a printed representation, both byte-compiled and not.

Here's the proof of concept code: dist-emacs.el

A central server listens for TCP connections. Clients offering their CPU for use connect to the server and await instructions. The server sends a single, no-argument, anonymous function to the client. The client calls the function, returning the resulting form back to the server. In order to transmit the function it's encoded into a string using the Lisp printer, and the client turns it back into an executable function with the Lisp reader.

For some simple security there is a shared password between the client and server. When the server sends a function it includes a signature, and the client only runs code that matches the signature. To create a signature the string encoded version of the function is appended with the password (both strings) and hashed with a secure hashing algorithm. Only someone who knows the password -- including other clients -- can create the signature.

(defun sign-sexp (password sexp)
  "Return signature of the given s-exp."
  (sha1 (format "%s%s" password sexp))

To make it easy for the client to read in both the signature and the function we just cons them together before encoding them as text.

(defun encode (password sexp)
  "Encode a s-exp for transmission to client."
  (prin1-to-string (cons (sign-sexp password sexp) sexp)))

The client calls the Lisp reader on the string, then checks the signature in the car cell against the s-expression in the cdr cell. This will return the function if it's legitimate, otherwise nil.

(defun decode (password str)
  "Decode string into s-exp, checking the signature in the process."
  (let* ((cons (read str))
         (sig  (car cons))
         (sexp (cdr cons)))
    (if (equal sig (sign-sexp password sexp))
        sexp
      nil)))

And that's the core of it. It just needs some network code to move the string between computers. That part can be found in the linked source above.

To demo this, I'll use the whiten function from my previous post. I'll run it with three different strings on three different computers. Assume we started the dist-emacs server (dist-start) and connected three clients (dist-connect) from three computers to it. The clients were fired up from scratch so there's no whiten function on them yet, but there is one defined on the server. First we'll send the function definition to the clients. The dist-dist function takes a list of functions and passes each one to a client. Ideally I'd want this function to be more intelligent, managing a work queue so that an arbitrary length list of functions will be fed one at a time to each client. That's not the case here.

(dist-dist (mapcar (lambda (p)
                          `(lambda ()
                             (fset 'whiten ,(symbol-function 'whiten))))
                      dist-clients))

Also like in the previous post, this is an abstraction leak with the Emacs implementation. But I like this trick so I'm going to use it anyway. :-) Next we call it on each client with a different string.

(dist-dist (list (lambda () (whiten "good"))
                 (lambda () (whiten "news"))
                 (lambda () (whiten "everyone"))))

The way I have it set up for my proof of concept the results are just spit back into the server's *Messages* buffer. If we watch that buffer we can see each results come back in one at a time as each machine finishes. I can watch Emacs saturate the CPU on every client machine simultaneously as it works.

"2577343027adf7817185db876032d8ed"
"46a65dac2c0040afde175adf1e9a81fd"
"f39baf9e74475dd5be7d5495a025fe84"

This isn't the same order as the clients, but the order in which the jobs were completed.

As for the practicality, I doubt there really is one. It's really only a neat concept (or maybe not even that). For almost the exact same reasons as my distributed JavaScript idea, this is a solution looking for a problem. The problem needs to be able to be broken into small computation units, because Emacs has no threading, and it has to be low bandwidth, because it has to be parsed all at once from a string. If you want to pass large data sets it needs to be done out-of-band, which probably defeats the purpose. There seem to be few to no problems that fit these limitations.


Elisp Memoize

Memoization is something I think should be packaged as a standard function for just about every language. That's not generally the case, but luckily this is easy to fix in Lisps. I needed memoization recently for an Elisp project I'm working on. I could have hand-written one but a generic memoization function would have worked just fine. Since I didn't find any generic Elisp memoization on-line I wrote my own.

Download: memoize.el

Just put it in your path and (require 'memoize) it. Here's the core function.

;; ID: 83bae208-da65-3e26-2ecb-4941fb310848
(defun memoize-wrap (func)
  "Return the memoized version of the given function."
  (let ((table-sym (gensym))
        (val-sym (gensym))
        (args-sym (gensym)))
    (set table-sym (make-hash-table :test 'equal))
    `(lambda (&rest ,args-sym)
       ,(concat (documentation func) "\n(memoized function)")
       (let ((,val-sym (gethash ,args-sym ,table-sym)))
          (if ,val-sym
              ,val-sym
            (puthash ,args-sym (apply ,func ,args-sym) ,table-sym))))))

Normally the hash table would be stored inside of a closure, but Elisp doesn't have closures. Instead, the hash table is stored in an uninterned symbol, which will only (generally) be accessible to the memoization wrapper (in order for external code to access the table it would need to directly inspect the memoized function object). It's not quite a scope but it's close enough.

Note that it keeps the original function documentation intact. I want the memoization wrapper to be an unobtrusive as possible.

This is a bit of an abstraction leak. It's creating new code at run time taking advantage of the specific low-level way Emacs executes s-expressions. If Emacs ever stopped directly evaluating s-expressions then this code would break. A more proper way to do this would be at compile time with a macro, where this exact sort of code generation is expected. Rather than defun we would use a macro called something like defmemoized.

Here's a demo of it in action. This whiten function is computationally expensive: it performs key whitening. It repeats a hash function thousands of times to produce an expensive value. This isn't something you generally want to memoize, but stick with me.

(defun whiten (key)
  "Perform key whitening with the md5 hash function."
  (dotimes (i 100000 key)
    (setq key (md5 key))))

(whiten "password")   ; takes a couple of seconds

On my laptop that takes a couple of seconds to run. Increase that counter if it's quick on your computer. My memoize package provides a memoize function which will create a new function that wraps the original, then installs the new function in place of the old one if we give it the function symbol.

(memoize 'whiten)

The first time you run it after memoization it will be slow, but after that the memoization kicks in for a quick return.

There are two Elisp specific issues at hand. First is that memoizing an interactive function will produce a non-interactive function. It would be easy to fix this problem when it comes to non-byte-compiled functions, but recovering the interactive definition from a byte-compiled function is more complex than I care to deal with. Besides, interactive functions are always used for their side effects so there's no reason to memoize them.

Second is a limitation of Elisp hash tables. There's no way to distinguish a nil value and no value. The hash table returns nil for both. This means you cannot memoize nil returns. But a computationally expensive function shouldn't be returning nil anyway.

Hopefully someone else will find good use for this as well.


GIMP Painting

I drew the magic space elevator a few days back after some practice. Here's my very first attempt at this art style,


Full Size Video

And the image itself,


Pen and Paper RPG Wishlist

As I get more involved with tabletop RPGs, specifically Dungeons and Dragons, I find there are some related attributes that I wish these game systems had. While I'm sure there are systems do have some of these, I wish whatever I happen to be using had all of them.

Print friendly. The source material tends to be very colorful and graphical. While this can be a good thing, especially when illustrating monsters (Show, not tell!), it's bad if you want to print out your own materials. I want the crucial information available in a crisp, clean monochrome form of some sort. Not only could I reproduce material for use in notes and handouts, but I could create my own condensed sets of information by composing these crisp forms.

For example, in the D&D monster manual each monster has a nice concise block containing all the information -- defenses, health, abilities, etc -- needed to use that monster. This is great, but it's on a brownish background, in a red-ish box. So close to being what I want. But even then, do I have legal permission to reproduce this information? And so ...

Licensing. The closest thing tabletop gaming has to a Free Software license would be the Open Game License (OGL), which is still pretty restrictive. I would love for the source materials to be licensed at least loosely enough that I could print out my own copies for cheap (assuming they are print friendly, per above). Have some new players sitting down at the table? To get them started, give them that stapled-together player handbook you printed out. There's RPG evangelism for you.

The Fudge role-playing game system has both these attributes down pretty well. The Fudge manual is very print friendly PDF with explicit permission to share it with your friends. However, just as yacc is a compiler compiler, Fudge is really a game system system, a system for creating game systems, so it's only part of what is needed to play a game.

Useful software tools. One specific example is character creation software. Creating a new character can be burdensome, especially for a new player. Software that allows a player to select some basic options from a menu and produce a printable, error-free character sheet can save a lot of time.

Fourth edition Dungeons and Dragons has a character builder, but it is a humongous piece of junk. It's proprietary, Windows-only, bulky, and slow. For a program that merely generates printouts based on a few user selections from some simple menus, it has some extremely excessive system requirements (much higher than the ones they claim). And it requires a reboot to install too. A human can produce the same results by hand inside of a half hour, so for a computer there is virtually no computation involved. So what is it doing? Worse of all, the fourth edition license expressly forbids competing character creation software, so no one can legally produce a reasonable one. All this thing should be is a database of available character abilities, some character sheet logic, and a postscript printer.

Fortunately there are some decent, generic world generation tools for GMs out there, such as random inn generators, random dungeon generators, and so on. And another one. I've mentioned this before.

If you know any systems that fit the above descriptions well, go ahead and link them in the comments!


GIMP Space Elevator Drawing

I've been looking for a nearby tabletop gaming group for awhile now. I asked people I knew. I asked around at work. I just couldn't find anyone. Luckily, a new thing that Wizards of the Coast has been doing is D&D Encounters where prepared adventures are run every week openly at local gaming stores. The purpose is for casual players or beginners to be able to freely to play some D&D without needing any commitment, preparation, or equipment. Characters are pre-generated, so no spending a half hour creating some new player characters every week.

I hopped into it for season two, which just started 6 weeks ago. Each weekly session is a 1.5 to 2 hour combat encounter. I've been having fun, but honestly it's not all that exciting compared to what a real campaign can bring. There is no role-playing, practically no NPC interaction, no puzzles, and no exploration. It also doesn't help that the adventures and characters are riddled with mistakes and very unbalanced. For an example of unbalanced, the character I've been playing, Barcan (or Barqan depending on where you are in the character sheet), could be killed -- and I'm not talking about unconscious dying but negative bloodied value dead -- by a monster critical strike in just about every encounter so far. Every time a monster attacked me there was a 1 in 20 chance, even at full health, that I might be done playing for the week.

But the great part is that it got me connected to other players in my area, which I think is the most valuable part of Encounters. One of my fellow players was just starting a regular gaming group and invited me to come along, so we've been playing on weekends now, with the intention of taking turns as the DM among those who are interested. And for a little irony, everyone except one person in the group also works at the lab. I guess I didn't ask around enough!

So I'm going to be DMing a 4e Dungeons and Dragons campaign sometime in the near future, and I'm quite excited about it!

Anyway, what does that have to do with drawing a space elevator in the GIMP? First of all, if you're one of the players in my group who found their way here stop reading now. You'll find out all this in the first session, so come back after that. So, I've had this campaign idea in my head for a couple of years now, and it involves a skyhook of sorts constructed by a combination of careful engineering and powerful arcane magic. It leads somewhere, not space, but somewhere. Since that somewhere is part of the mystery I won't reveal where that is, but if the campaign goes well I'll write about it more in the future.

When I run the first session I want to illustrate the skyhook to the players. Show, not tell they say. I like some of the work people do imitating Bob Ross in the GIMP. I've done a few of these to try it out, and it's surprising how well it can turn out even from a beginner. I employed this new art education to draw my skyhook for my game. The GIMP undo history reveals how I did it,


Full Size Video

And the result,

In retrospect I should have drawn the skyhook right after I finished those first clouds, since it's behind everything else in the scene. Oh, and that thing on the bottom left is a twisted scar left behind from a previous attempt at building the skyhook, but it collapsed. It's a dangerous place to be.


Emacs Byte Compilation

A feature unique to some Lisps is the ability to compile functions individually at any time. This could be to a bytecode or native code, depending on the dialect and implementation. In a Lisp implementations where compilation matters (such as CLISP), there are typically two forms in which code can be evaluated: a slower, unoptimized uncompiled form and a fast, efficient compiled form. The uncompiled form would have some sort of advantage, even if it's merely not having to spend time on compilation.

In Emacs Lisp, the uncompiled form of a function is just a lambda s-expression. The only thing that gives it a name is the symbol it's stored in. The compiled form is a (special) vector, with the actual byte codes stored in a string as the second element. Constants, the docstring, and other things are stored in this function vector as well. The Elisp function to compile functions is byte-compile. It can be given a lambda function or a symbol. In the case of a symbol, the compiled function is installed over top of the s-expression form.

(byte-compile (lambda (x) (* 2 x)))
  => #[(x) "^H\301_\207" [x 2] 2]

The compiler will not only convert the function to bytecode and expand macros, but also perform optimizations such as removing dead code, evaluating safe constant forms, and inline functions. This provides a nice performance boost (testing using my measure-time macro),

(defun fib (n)
  "Fibonacci sequence."
  (if (<= n 2) 1
    (+ (fib (- n 1)) (fib (- n 2)))))

(measure-time
 (fib 30))
  => 1.0508708953857422

(byte-compile 'fib)

(measure-time
 (fib 30))
  => 0.4302399158477783

Most of the installed functions in a typical Emacs instance are already compiled, since they are loaded already compiled. But a number of them aren't compiled. So, I thought, why not spend a few seconds to do this?

In Common Lisp, there is a predicate for testing whether a function has been compiled or not: compiled-function-p. For whatever reason, there is no equivalent predefined in Elisp, so I wrote one,

(defun byte-compiled-p (func)
  "Return t if function is byte compiled."
  (cond
   ((symbolp   func) (byte-compiled-p (symbol-function func)))
   ((functionp func) (not (sequencep func)))
   (t nil)))

My idea was to iterate over every interned symbol and, if the function slot contains an uncompiled function, using the test above, I would call byte-compile on it. Well, it turns out that byte-compile is very flexible and will ignore symbols with no function and symbols with already compiled functions.

So next, how do we iterate over every interned symbol? There is a mapatoms function for this. Provide it a function and it calls it on every interned symbol. Well, that's simple and anticlimactic.

(mapatoms 'byte-compile)

That's it! It will take only a few seconds and spew a lot of warnings. I haven't found a way to disable those warnings, so this isn't something you'd want to have run automatically, unless you like having an extra window thrown in your face. I've only discovered this recently, so I'm not sure what sort of bad things this may do to your Emacs session. Not every function was written with compilation in mind. There are interactions with macros to consider.

I doubt there will be a noticeable performance difference. Like I said before, most everything is already compiled, and those are the functions that get used the most. There's just something nice about knowing all your functions are compiled and optimized.


Emacs ParEdit and IELM

ParEdit is a powerful extension to Emacs that I've just begun using recently. It's a minor mode that forces all parenthesis, square brackets, and quotes to be balanced at all times. While it's useful for any programming language it's especially suited for Lisps, because it's designed for manipulating nested parenthesis -- i.e. s-expressions. It's not currently part of Emacs so you have to drop the script in your load-path somewhere.

I've frequently thought that a Lisp-based shell would be an interesting and powerful tool, much like a normal Lisp REPL. Programs would be treated like Lisp functions. For example,

wellons@luna:~$ (ls -l .emacs)
-rw------- 1 wellons wellons 4859 2010-06-10 23:20 .emacs
wellons@luna:~$

But typing all those parenthesis all the time would be quite the nuisance. I know this from experience typing at Lisp REPLs. I imagined something that works exactly like ParEdit would be needed to make all that work go away. To save even more time each prompt would begin with a nested pair, with the cursor placed between them. Then typing a quick command is no different than a normal shell.

wellons@luna:~$ ()

Well, in Emacs we have both ParEdit and REPLs, so we can compose these features together with just a little advice. Here's how to do it with the Interactive Emacs-Lisp Mode (IELM) REPL. First tell IELM to use ParEdit,

(add-hook 'ielm-mode-hook (lambda () (paredit-mode 1)))

The function in IELM that spits out the next prompt is ielm-eval-input, so we give it the advice to call the ParEdit function afterwards to insert a parenthesis pair.

(defadvice ielm-eval-input (after ielm-paredit activate)
  "Begin each IELM prompt with a ParEdit parenthesis pair."
  (paredit-open-round))

And that's it! Note that the first IELM prompt is not placed by this function so it won't appear until the second prompt.

*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> 
ELISP> ()

If you want to enter a single atom and don't need parenthesis, just hit backspace once. This is much less common so it gets the extra keystroke.

This can be done for inferior-lisp and SLIME to enhance those REPLs as well. You just have to figure out which defun to advise.


Elisp Printed Hash Tables

A printed hash table representation is pretty new to Elisp, and a bit late. As far as I know Elisp didn't come with a way to print, and read back in, a hash table without rolling your own (like Jared Dilettante was doing with a Data::Dumper style output), until 23.1 in July 2009. This is when json.el was first included with Emacs, for dumping to and reading from JSON.

(require 'json)

(setq hash (make-hash-table))
(puthash "key1" "data1" hash)
(puthash "key2" "data2" hash)

(insert "\n;; " (json-encode hash))
;; {"key2":"data2", "key1":"data1"}

Just a month ago Emacs 23.2 came out, very silently including a new printed representation for hash tables with a #s hash notation.

#s(hash-table data ("key1" "data1" "key2" "data2"))

With this hash tables can be printed and read as part of normal s-expressions with the standard lisp reader and printer functions. It seems heavy, having to write out "hash-table" in there, but I think it's because the #s notation will be used to create printed forms of other lisp objects that currently do not have one.


Identifying Files

At work I spend about a third of my time doing data reduction, and it's one of my favorite things to do at the laboratory (I've done it on my own too). Data come in from various organizations and sponsors in all sorts of strange formats. We have a bunch of fancy analysis tools to work on the data, but they aren't any good if they can't read the format. So I'm tasked with writing tools to convert incoming data into a more useful format.

If the source file is a text-based file it's usually just a matter of writing a parser -- possibly including a grammar -- after carefully studying the textual structure. Binary files are trickier. Fortunately, there are a few tools that come in handy for identifying the format of a strange binary file.

The first is the standard utility found on any unix-like system: file . I have no idea if it has an official website because it's a term that's impossible to search for. It tries to identify a file based on the magic numbers and other tests, none based on the actual file name. I've never been to lucky to have file recognize a strange format at work. But silence speaks volumes: it means the data are not packed into something common, like a simple zip archive.

Next, I take a look at the file with ent, a pseudo-random number sequence test program. This will reveal how compressed (or even encrypted) data are. If ent says the data are very dense, say 7 bits per byte or more, the format is employing a good compression algorithm. The next step would be tackling that so I can start over on the uncompressed contents. If it's something like 4 bits per byte there's no compression. If it's in between then it might be employing a weak, custom compression algorithm. I've always seen the latter two.

Next I dive in with a hex editor. I use a combination of Emacs' hexl-mode and the standard BSD tool hexdump (for something more static). One of the first things I like to identify is byte order, and in a hex dump it's often obvious.

In general, better designed formats use big endian, also known as network order. That's the standard ordering used in communication, regardless of the native byte ordering of the network clients. The amateur, home-brew formats are generally less thoughtful and dump out whatever the native format is, usually little endian because that's what x86 is. Worse, they'll also generate data on architectures that are big endian, so you can get it both ways without any warning. In that case your conversion tool has to be sensitive to byte order and find some way to identify which ordering a file is using. A time-stamp field is very useful here, because a 64-bit time-stamp read with the wrong byte order will give a very unreasonable date.

For example, here's something I see often.

eb 03 00 00 35 00 00 00 66 1e 00 00

That's most likely 3 4-byte values, in little endian byte order. The zeros make the integers stand out.

eb 03 00 00 35 00 00 00 66 1e 00 00

We can tell it's little endian because the non-zero digits are on the left. This information will be useful in identifying more bytes in the file.

Next I'd look for headers, common strings of bytes, so that I can identify larger structures in the data. I've never had to reverse engineer a format ... yet. I'm not sure if I could. Once I got this far I've always been able to research the format further and find either source code or documentation, revealing everything to me.

If the file contains strings I'll dump them out with strings. I haven't found this too useful at work, but it's been useful at home.

And there's something still useful beyond these. Something I made myself at home for a completely different purpose, but I've exploited its side effects: my PNG Archiver. The original purpose of the tool is to store a file in an image, as images are easier to share with others. The side effect is that by viewing the image I get to see the structure of the file. For example, here's my laptop's /bin/ls, very roughly labeled.

The different segments of the ELF binary are easily visible.

It's easy to spot the different segments of the ELF format. Higher entropy sections are more brightly colored. Strings, being composed of ASCII-like text, have their MSB's unset, which is why they're darker. Any non-compressed format will have an interesting profile like this. Here's a Word doc, an infamously horrible format,

And here's some Emacs bytecode. You can tell the code vectors apart from the constants section below it.

If you find yourself having to inspect strange files, keep these tools around to make the job easier.


Neat Random Inn Generator

Joe Wetzel over at Inkwell Ideas put together a neat Random Inn/Tavern Generator for use in tabletop RPGs. He's still working on it so it's gradually building up more features. It's just JavaScript and HTML, so no special plugins are needed for it to work. Right now it appears to be the only generator that makes floorplans.

It creates a random floorplan using image segments, so you'll have to screenshot the website if you plan on saving a particular layout as a single image. It also generates a menu, including prices, which I think adds a lot of personality to the otherwise bare floorplan.

There are controls to fix certain properties of the generator in place, making it less random, so you can control things such as secret doors, inn size, and second floors. In the future there are plans for adding generated NPC staff and patrons, and even plot hooks.

Ameron over at Dungeon's Master just wrote a post Tavern Trappings as a guide for how to fill a tavern with everything else. It compliments the random inn generator well.

If I ever GM a game someday I think this would come in handy.


Emacs UUIDs

Yesterday Luke pointed out an interesting idea by Jeff Atwood: mark code snippets you post online with a UUID. That way it's easy to find other places where code snippets have been used, and it's easier to find updates. It's a neat idea that I might start using.

So, the next problem is where to get a UUID. One place to get one is createguid.com, which will serve you up a fresh version 4 UUID each visit. But I don't like relying on the Internet for something when I don't need to. This is why I sometimes make wget backups of my favorite sites, in case they ever disappear. By design it's easy to generate your own UUIDs, and you have 5 algorithms to choose from, so I'll make my own.

Since I'll always be in Emacs when I need one, I'll just have Emacs do it. However, it's a bit tricky to do in Emacs. The pseudo-random number generator isn't very good. It's always initialized to the same seed when it starts, so it's up to the user to reseed somewhere. There's no direct way to read from /dev/urandom, instead relying on an external process to feed it in. Relying on /dev/urandom is also not portable. There's also no simple, portable way to find out hardware information about the machine, such as MAC address, required by some versions of the UUID algorithms. Elisp integers are a few bits smaller than native integers (29-bits on 32-bit systems), so the UUID has to be handled in very small pieces.

I was beginning to work on my own when I found this great solution by Martin Blais (mirror). Specifically the uuid-simple function, which takes an MD5 hash of unique data in Emacs and uses it for the UUID. I believe this may count as a version 3 UUID (or at least close enough that I don't care). I modified it slightly, adding more unique data and a version number.

;; 90aebf38-b33a-314b-1198-c9bffea2f2a2
(defun uuid-create ()
  "Return a newly generated UUID. This uses a simple hashing of variable data."
  (let ((s (md5 (format "%s%s%s%s%s%s%s%s%s%s"
                        (user-uid)
                        (emacs-pid)
                        (system-name)
                        (user-full-name)
                        user-mail-address
                        (current-time)
                        (emacs-uptime)
                        (garbage-collect)
                        (random)
                        (recent-keys)))))
    (format "%s-%s-3%s-%s-%s"
            (substring s 0 8)
            (substring s 8 12)
            (substring s 13 16)
            (substring s 16 20)
            (substring s 20 32))))

This returns a string containing the UUID. Next, make an interactive function to wrap it,

(defun uuid-insert ()
  "Inserts a new UUID at the point."
  (interactive)
  (insert (uuid-create)))

I bound this to "C-x !", which wasn't yet bound to anything.

(global-set-key "\C-x!" 'uuid-insert)

Instant UUIDs in Emacs.


The Problem with String Stored Regex

While regular expressions have limited usefulness, especially in larger programs, they're still very handy to have from time to time. It's usually difficult to write a lexer or tokenizer without one. Because of this several languages build them right into the language itself, rather than tacked on as a library. It allows the regular expressions to be stored literally in the code, treated as its own type, rather than inside a string. The problem with storing a regular expression inside a string is that it can easily make an already complex regular expression much more complex. This is because there are two levels of parsing going on.

Consider this regular expression where we match an alphanumeric word inside of quotes. I'm going to use slashes to delimit the regular expression itself.

/"\w+"/

Notice there is no escaping going on. The backslash is there is indicate a special sequence \w, which is equal to [a-zA-Z0-9_]. This will get parsed and compiled into some form in memory before it is run by a program. If the language doesn't directly support regular expressions then we usually can't put it in the code as is, since the language parser won't know how to deal with it. The solution is to store it inside of a string.

However, our regular expression contains quotes and these will need to be escaped when in a quote delimited string. But I no longer need slashes to delimit my regular expression.

"\"\w+\""

Did you notice the error yet? If not, stop and think about it for a minute. Our special sequence \w will not make it intact to the regular expression compiler. That backslash will escape the w during the string parsing step, leaving only the w. The string we typed will get parsed into a series of characters in memory, performing escapes along the way, and then that sequence will be handed to the regular expression compiler. So we have to fix it,

"\"\\w+\""

That's getting hard to understand, compared to the original. Now let's throw a curve-ball into this: let's match a backslash at the beginning of the word. The normal regular expression looks like this now,

/"\\\w+"/

We have to escape our backslash to make it a literal backslash, so it takes two of them. Now, when we want to do this in a string-stored regular expression we have to escape both of those backslashes again. It looks like this,

"\"\\\\\\w+\""

Now to match a single backslash we have to insert four backslashes! Quite unfortunately, Emacs Lisp doesn't directly support regular expressions even though the language has a lot of emphasis on text parsing, so a lot of Elisp code is riddled with this sort of thing. Elisp is especially difficult because sometimes, such as during prompts, you can enter a regular expression directly and can ignore the layer of string parsing. It's a very conscious effort to remember which situation you're in at different times.

Perl, Ruby, and JavaScript have regular expressions as part of the language and it makes a lot of sense for these languages; they tend to do a lot of text parsing. Python does it partially, with its r' syntax. Any string preceded with an r loses its escape rules, but it also means you can't match both single or double quotes without falling back to a normal string with escaping. Common Lisp may be able to do it with a reader macro, but I've never seen it done.

Remember those two levels of parsing when writing string stored regex. It helps avoid hair-pulling annoying mistakes.


LZMA and XZ Binaries

Stripped Windows binaries download: these can handle both the XZ and the older LZMA formats.

version 4.999.9beta
43f37ea5ab4881d7aec5e61c3189a6a2  xz.exe (158 kB) (signature)
0e6360cebcfad01de4fefdc19302185f  xzdec.exe (70 kB) (signature)
cc4044fcc073b8bcf3164d1d0df82161  xz-4.999.9beta.tar.bz2 source (830 kB)

Previously I had predicted that LZMA tarballs were the future. I'd like to make a slight correction to this: XZ tarballs are the future! LZMA has actually already been supplanted by the newer LZMA2 algorithm, which is most directly accessed with the xz tool from XZ Utils. So look out for .tar.xz in the future.

Right now I believe that XZ will get you the smallest compression of any general purpose compressor (at a large memory cost). When combined with tar it even beats out 7z.

Like all productive computer activity on the Windows platform, dealing with LZMA or XZ tarballs is needlessly difficult in Windows. Archive software, such as 7-zip, doesn't seem to handle them directly yet. They can read the underlying tar layer but the compression layer seems to be opaque. 7-zip actually uses the LZMA and LZMA2 algorithms internally, so it's 99% of the way there. They just never did the last step.

So here's where the above executables, provided for your convenience, come in: xz.exe was cross-compiled in Debian GNU/Linux using the MinGW compiler suite, and it can perform both compression and decompression on both XZ and LZMA. xzdec.exe can only do decompression, but it's smaller. You can use these to pull out the tar archive so that it can be accessed using the regular tools. Or you can use it to make your own highly compact archives.


Scheme Live Coding

Live coding (or livecoding) is software development as a performance art. A programmer's screen is viewed by the audience and a program is written and modified so that it produces sound and maybe even visual effects. The audience gets to see the code and its effects live. I'm not sure if "live" refers to the audience, the editing of live code, or maybe both. There are videos all over the web of this in action so if you haven't seen it yet do a quick search and watch one.

It's fairly easy to obtain livecoding software. For example, there's Fluxus, which extends PLT Scheme to support livecoding.

I've never done livecoding myself, but something I've noticed is that Scheme seems to be a popular choice of language for livecoding. I think I know how and why this is. Scheme, being a Lisp dialect, is naturally a living system: it can be modified and extended while it's actively running. Scheme in particular is very well suited for the task thanks to its simplicity and optimized tail recursion.

I'll do a little text-based livecoding example in PLT Scheme to show how it works. This will be easier to do yourself if your text editor can interact directly with a REPL (like Emacs or DrScheme).

Let's define a function that prints a line of text to the screen and recurses, so that it continues printing forever. The recursion is important here and I'll get back to it. To keep things manageable I'm also going to insert a 1 second pause.

(define (print-str)
  (display "Hello!\n")
  (sleep 1)
  (print-str))

If we call this function with (print-str) it will sit there printing "Hello!" over and over. It will also lock up our REPL preventing us from doing anything else. Not very useful. So let's put it in a thread instead!

(thread print-str)

Now our program is running and we get to keep our REPL. Why do we need to keep our REPL alive? Well, so we can redefine print-str on the fly! In my buffer I'll go back and change "Hello!" to "Goodbye!". While I'm doing this the function is still spitting out "Hello!".

(define (print-str)
  (display "Goodbye!\n")
  (sleep 1)
  (print-str))

As soon as I tell my editor to pass this to the REPL the print-str function gets redefined and starts printing "Goodbye!" instead. Why did the running function change? Because of recursion. When it called itself, it actually called the new definition.

Since I didn't keep a handle on the thread the easiest way to stop print-str from running is to redefine it without recursion.

(define (print-str)
  (display "Done.\n"))

And it's done. If I was really fast about this my output looks something like this.

Hello!
Hello!
Hello!
Goodbye!
Goodbye!
Done.

That's the fundamental workings of livecoding in Scheme: I changed a program while it was running. To turn the above into the more interesting livecoding you see in the videos all we need are some audio and visual bindings (which is the hard part of it all).


Scheme Neural Network

Two months ago I wanted to experiment with artificial neural networks in Scheme. I've written one in C++ before, but I wanted a pure Scheme implementation. So I wrote one. Grab it here,

git clone http://git.nullprogram.com/scheme-ann.git

It fakes OOP with closures and a dumb message passing scheme so that I could treat individual neurons like objects. The neurons are wired up to push orders along backwards, so most of the work from the implementation point of view is on the output neurons.

If I found an application for it I thought it would be neat to save off the network weights, thus storing the "brain", as an s-expression that could be dropped directly in the source code (something Lisp does very nicely). Even better, if it was a programming competition I could obfuscate the neural network implementation a bit -- so that no one knew it was a neural network -- and have a mysterious, opaque lump of mixed code and data.

Using it is really simple: once the neural network functions are loaded creating a new arbitrary network is as easy as the following. This creates a network with 4 input neurons, a hidden layer of 5 neurons, a hidden layer with 6 neurons, and an output layer with 3 neurons.

(new-ann '(4 5 6 3))  ; input 4 bits, output 3 bits

Then the functions train-ann and run-ann are used, with a list of bits as arguments, to train and run the neural network. Included in the repository are two examples showing this in action, using the provided helper functions int->blist (integer to bit list) and blist->int.

Like most of my neural network experiments this is really disappointing. It doesn't seem to work well outside some trivial or contrived situations, and it quickly slows down after adding only a couple more layers. It might just have to do with the type of network I'm using, and perhaps I should look into more advanced and complex neurons.


Emacs cat-safe

I was inspired by an item in Luke's Tumblr blog last night. It was a screenshot of a program called PawSense, which monitors a computer's keyboard for cat activity. (I don't know if it's any good, but it's funny.) As anyone with cats knows, it's not unusual to leave a computer only to come back later to see garbage typed in by a wandering cat. I wrote a version for Emacs today.

git clone http://git.nullprogram.com/cat-safe.git

Or you can download a snapshot: cat-safe-master.tar.gz.

Put it (cat-safe.el) somewhere in your load-path (like ~/.emacs.d/) and put this line in your .emacs file,

(require 'cat-safe)

Emacs switches focus to a new buffer to stop cat damage.

This only monitors Emacs itself; it should help protect your buffers but not your web browser. When cat interference is detected Emacs switches focus to a junk buffer and lets the cat make a mess there instead. In case your cat happens to type out some Shakespeare you will be able to read it in the junk buffer. Just kill the junk buffer to return to work.

It could still use some improvement. Right now it looks for a single key being help down, excepting keys humans tend to hold down like backspace, delete, and space. If you play around with it you'll notice if you press several keys at once Emacs will sometimes create a pattern with them. I need to figure out a good way to detect this.

I'm going to run it at home for awhile to make sure it remains transparent, but still does its job. It will probably incur a performance penalty on frequently repeated keyboard macros.


A Fractran Short Story

Fractran is a Turing-complete esoteric programming language. A Fractran program is just an ordered list of positive, irreducible fractions. The program's output for an input n is the output of the program run on n multiplied by the first fraction in the list that results in an integer. If no such multiplication results in an integer, the output is the input n. Variables are encoded in the exponents of the prime factorization of the input and output.

Some time ago I thought up an idea for a short story involving Fractran. A mathematician accidentally creates a Fractran program that can trivially factor large composites. Think something like O(log n). It's just the right magical string of, say, 31 fractions.

The story would be a first-person narrative of the mathematician's thoughts during a short time after the discovery, considering many of the consequences of the program. For example, it would render much of cryptography, which plays an essential role in the modern world, useless. He would also wonder if mankind should deserve such a discovery, considering how accidental it was.

This whole idea vanished once I realized that this Fractran program is actually completely trivial. It even runs in O(1) time. It's so trivial as to be worthless. Remember that Fractran stores its data in the number's prime factorization? The Fractran program that can factor any number in constant time is the identity function. To decode the output, which matches the input, all you need to do is factor it!

Interestingly, it doesn't seem to actually be possible to implement the identity function in Fractran (But somehow it's Turing-complete? Hmmm... more investigation needed.), unless you can define your program in terms of its input. For example, the program 1/(n+1) is the identity function for input n.


Function Pointers are Special

Here's something I only learned recently, since it came up when working on Wisp. In C function pointers are incompatible with normal pointers. For example, this is unportable C,

int func (int x);

int main ()
{
  void *p = (void *) &func;
}

This is because function pointers are a different size than other pointers on some architectures, or even within the same architecture with different models (x86's compact and medium models). If the compiler in such a scenario allowed this, the pointer may be truncated and would likely point to the wrong place. It wasn't until I added the -pedantic flag to gcc that it started warning me about situations like the above. The -W -Wall flags are silent here.

The relevant part of the ANSI C standard lists the following as a common, but unportable, extension to the language,

A pointer to an object or to void may be cast to a pointer to a function, allowing data to be invoked as a function. A pointer to a function may be cast to a pointer to an object or to void, allowing a function to be inspected or modified (for example, by a debugger).

There is a discussion, including an example, on Stack Overflow: Can the Size of Pointers Vary Depending on what's Pointed To?. It also links to this comp.lang.c FAQ question Question 4.13 suggesting the use of a union, which is exactly what I did in Wisp.

I bet this issue only comes up very rarely. How often do you have to store a function pointer in a void pointer? It subverts the type system and is generally a bad idea. I had to do it in Wisp as part of its value polymorphism, which is why it bit me. This is probably why gcc doesn't get very picky over it.

This also means function pointers have less support than normal pointers. For example, printing pointers with printf()'s %p won't work, since it expects a void pointer, so there's no printing them. You can't sort them with qsort(). You can even treat the function pointer as a blob of data to manipulate manually since there's no safe way to make a regular pointer to it. Really, almost any C library function that accepts pointers won't work with function pointers.

So if you want a tricky, unfair, interview question this could be one!


Common Lisp Quick Reference

I found this Common Lisp Quick Reference the other day from r/lisp, and I think it's fantastic. It's a comprehensive, libre booklet of the symbols defined by the Common Lisp ANSI standard. Very slick!

The main version is meant to be printed out and nested with a vertical fold, and it works quite well. If I ever get a chance to use Common Lisp at work (a man can dream), probably at a location without Internet access, this could come in handy. So I printed out one for myself,

In case the website ever disappears in the future, here's all the PDFs and LaTeX source for it. These could be out of date, so you should check the main website first.

clqr-a4-booklet-all.pdf
clqr-a4-booklet-four.pdf
clqr-a4-consec.pdf
clqr-letter-booklet-all.pdf
clqr-letter-booklet-four.pdf
clqr-letter-consec.pdf
clqr.tar.gz (LaTeX source)

And a local clone,

git clone http://git.nullprogram.com/clqr.git

Wisp Screencasts

I've been chugging away on Wisp, announced in my last post, every day since I started it a few weeks ago, and it's becoming a pretty solid system. There's now an exception system, reference counting for dealing with garbage, and a reentrant parser. It's no replacement for any other lisps, but I've found it to be very fun to work on. I've put what exists so far of the Wisp User Guide up for viewing. The AsciiDoc source is in the Git repository, which is here,

git clone http://git.nullprogram.com/wisp.git

I wanted to show off some of the new features of Wisp, and since I was inspired by Full Disclojure, since it's so damn slick, I decided to make some screencasts of Wisp in action. All of the screencast software for GNU/Linux is pretty poor, but after a few hours of head-banging I managed to hobble something together for you. Enjoy!

That video demonstrated the memoization function. It can be pulled in from the memoize library. You give it a symbol, which should have a function definition stored in it, and it will installed a wrapper around it. In the video I used the Fibonacci function from the examples library.

(require 'examples)
(fib 30) ; Slooooow ...
(memoize 'fib)
(fib 100) ; Fast!

This demonstrated the "detachment" feature of Wisp, which is similar to "futures" in Clojure. It forks off a new process, which executes the given function. The send function can be used in the detached process to send any lisp objects back to the parents, which can receive them with the receive function. The send function can be called any number of times to continually send data back. The receive function will block if there is no lisp object to receive yet.

(require 'examples)
(setq d (detach (lambda () (send (fib)))))
(receive d) ; Gets value from child process

This video shows off the point-free functions that have been defined: function composition and partial application (I accidentally say "partial evaluation" in the video). These are actually just simple macros that any lisp could do.


Wisp Lisp

Update 2010-2-04: A lot of the information below is out of date. There is an update here: Wisp Screencasts.

This is a project I've been wanting to do for some time, and I finally got around to doing it. I spent the last few days implementing my own lisp interpreter in C. Today, after sinking in about 48 hours of work, I believe I completed enough of it to consider it in a working state, with a code base stable enough that other interested people could contribute. It was really exciting to see everything come together today.

You can make a clone of the Wisp repository with Git. Go ahead; don't be shy,

git clone http://git.nullprogram.com/wisp.git

To build it, all you need is a C99 compiler, make, yacc (i.e. Bison), and lex (i.e. Flex).

It doesn't use the readline library or one of it's clones to make a nice interaction command line, so it's a good idea to run it with rlwrap. That's what I've been doing. If you plan on writing Wisp code in Emacs, putting this in your .emacs will give you all the syntax niceties,

(add-to-list 'auto-mode-alist '(".wisp\\'" . lisp-mode))

You should also be able to run it as an inferior lisp and send code to it like a normal lisp. I haven't done this yet myself.

I think the name is apt, because it really is a wisp of a lisp. As of this writing, it weighs in at 1500 lines of code and is still very feature-light. I haven't actually read any material about writing lisp interpreters, so I've been winging it based on my experience with it. It's already taught me a lot of subtle things about lisp that I hadn't been aware of before.

Right know it's very simple. It doesn't yet support any syntax beyond parenthesis (no ' quoting). No closures. No garbage collection, as I'm still working out how I'm going to do that. Dynamically scoped, since that's a lot easier to do. And, like Scheme, it's a lisp-1, meaning functions and variables share a common namespace. I hope to expand it to include some features from other lisps like Arc (particularly the anonymous function syntactical sugar), Common Lisp, and Scheme.

However, it does have already anonymous functions, which many popular languages still don't have. :-) It's far enough along to let you define crazy stuff like this,

(defun example (n)
  (if (<= n 0)
      (lambda (x) (* x 10))
    (lambda (x) (* x 2.0))))

Which you can call, interactively in this case, like so,

wisp> ((example  1) 20)
40.000000
wisp> ((example -1) 20)
200

Because there's no garbage collection yet, it leaks memory like a sieve. For garbage collection, I think I'll do a mark-and-sweep, marking objects based on their reachability from the symbol table. That still leaves some corner cases -- such as worrying about objects in limbo in the evaluator -- that I'm not sure about. I need to be careful not to free objects still in use. Still working that one out.

It has the multiplication, addition, subtraction, and division implemented, as well as the greater-than and less-than operators. Lisp macros are implemented. A number of special forms are defined, like let, if, set, defun, defmacro, car, cdr, not, progn, lambda, and while. All of the predicates are implemented. It has a C interface, which is how all the above got defined. I already have some functions and macros defined in terms of Wisp code, too. Most of the needed functions at this point are trivial to add, though a bit tedious, so I'm mostly trying to focus on the core parts of the interpreter right now.

It's amazing how much the internal code looks like lisp written in a C dialect. I have CAR and CDR macros defined, which get used all over the place, and code frequently uses them to walk lists like lisp code would.

The core struct that everything works with the the object struct. It's defined as such,

typedef struct object
{
  type_t type;
  void *val;
} object_t;

The type_t field comes from this enumeration,

typedef enum types
  { INT, FLOAT, STRING, SYMBOL, CONS, CFUNC, SPECIAL } type_t;

The type indicates what type of data the void pointer points to, making it sort-of polymorphic. Note the CONS type, the cons cell, used to create lists,

typedef struct cons
{
  object_t *car;
  object_t *cdr;
} cons_t;

There's the familiar car and cdr pointers. There are a bunch of helper functions to manipulate and build these. For example, c_cons() creates a cons cell,

object_t *c_cons (object_t * car, object_t * cdr);

Look familiar? Yup, that's the lisp cons function. Since the nil symbol is available in C code as NIL you can chain these together in C to make a list,

object_t *lst = c_cons (c_int(10), c_cons (c_str ("hello"), NIL));

Which puts together the simple list,

(10 "hello")

Since that's so cumbersome to write out, there's a parser that can read nice lisp code and use all those same functions to make the lists. Hence, the lisp reader.

If you want to help, it's pretty easy to add more CFUNCs, C functions that are exposed to Wisp lisp code. Right now, I'd like to expose the whole C math.h library, provide a nice I/O interface, and expose a bunch of string functions. The TODO file in the repository contains more things to be done.

Wisp will probably be getting it's own "project page" here at some point in the future. When it does, I'll update this post to point to it.

Oh, and I decided to make this available under a 2-clause BSD license, so someone could easily plug it into another program as an extension language (once Wisp has matured first, of course). That would be cool.


Setting up a Common Lisp Environment

Common Lisp is possibly the most advanced programming language. Think of pretty much any programming language feature and Common Lisp probably has it. Since lisp is the programmable programming language, when someone invents a new language feature it can probably be added to Common Lisp without even touching the language core.

However, if you're interested in digging into Common Lisp to try it out, you may find yourself quickly running into walls just getting started. It's a lot different than other programming environments you may be used to. The Common Lisp tutorials generally skip this step, assuming the user has an environment, or leaving that setup for the "vendor" to handle. So, here's a guide to setting up a great Common Lisp environment with Emacs and SLIME. It should work with any Common Lisp implementation and any operating system that can run Emacs (i.e. most of them). Even a much less capable one like Windows.

First, you need to pick a Common Lisp implementation and install it. Ideally, it should end up in your PATH. Like C, the language is defined solely by its standardized specification, rather than some canonical implementation. Steel Bank Common Lisp (SBCL) is currently the highest performing implementation, it's Free Software, and it runs on a wide variety of platforms, so take a look at that one if you're not sure.

Next, install Emacs. We're using Emacs not just because it's the best text editor ever created. :-D It's because that's what SLIME is written for, and Emacs is a lisp-aware editor. Really, Emacs is a lisp interpreter that happens to be geared towards text-editing. It's accused of breaking the rules of unix by being a single, monolithic program, but it's really a whole bunch of small lisp programs. You can even have a lisp REPL in Emacs (ielm), similar to what we will have once we're done here. It's plays very well with Common Lisp.

If you're unfamiliar with Emacs, you should stop here and familiarize yourself with it a bit. Really, you could spend a decade learning Emacs and still have more to learn. The tutorial should be good enough for now. Fire up Emacs and run the tutorial by pressing control+h then t. In Emacs notation, that's C-h t. C-h is the help/documentation prefix, which can be used to look up variables/symbols (v), functions (f), key bindings (k), info manuals (i), the current mode (m), and apropos (searching) (a). In the info manuals, you should be able to find the full Emacs manual, Elisp reference, and Elisp tutorial, since they are generally installed alongside Emacs these days. Nearly anything you might need to know can be found inside the included documentation.

Next, install SLIME. I'll be a bit more specific for this one. Make a .emacs.d directory in your home directory (whatever your HOME environmental variable is set to). This is a common place to put user-installed Emacs extensions. You will be putting your slime directory in here. There are two basic ways to obtain SLIME, as indicated right on their main page. You can do a CVS checkout of the SLIME repository, which allows you to follow it and run the latest version. Or you can grab a snapshot of the repository, which is provided, and dump it in there. Since I like you so much, I'll give you a third option. Here's a Git repository, maintained by someone very kind, that follows SLIME's CVS repository,

git clone git://git.boinkor.net/slime.git

Ultimately, you should have a directory ~/.emacs.d/slime/ that contains a bunch of SLIME source files directly inside.

Now, we tell Emacs where SLIME is and how to use it. Make a .emacs file in your home directory, if you haven't already, and put this in it,

(add-to-list 'load-path "~/.emacs.d/slime/")
(require 'slime)
(slime-setup '(slime-repl))

Once it's saved, either restart Emacs, or simply evaluate those lines by putting the cursor after each them in turn and typing C-x C-e. If you did everything right so far, you shouldn't have any errors. (If you did, go back up and see what you did wrong.) If your Common Lisp installation didn't end up in your PATH as "lisp" (not uncommon) for some reason, you may need to tell Emacs where it is. For example, I can point directly to my SBCL installation with this line,

(setq inferior-lisp-program "/usr/bin/sbcl")

If everything is set up right, fire up SLIME with "M-x slime". It should compile the back-end, called swank, and run a Common Lisp REPL as an inferior process to Emacs. You should end up with a nice prompt like this,

CL-USER>

At this line, you can start evaluating lisp expressions as you please. But this isn't where the true power of SLIME comes in yet. I'll give you an example: make a new file with a .lisp extension and open it. Throw some lisp in there,

(defun adder (x)
  (lambda (y) (+ x y)))

Type C-x C-k and it will send the current buffer over to be compiled and loaded. This code here uses a closure, so you know you aren't accidentally using Emacs lisp, as it doesn't have closures. At the REPL you can call it,

CL-USER> (funcall (adder 5) 6)

Which will print the return value, 11. That's all there is to it. You write code in the buffer, then with a simple keystroke send it to the Common Lisp system to be evaluated and loaded. Because the SLIME key bindings eclipse the Emacs lisp key bindings, you can type this same line in the lisp source buffer place the cursor at the end, and type C-x C-e, which will send it out to Common Lisp to be evaluated. Look at the mode help (C-h m) to see all the key bindings made available.

This is a great programming environment that makes Common Lisp all the more fun to use. You run a single, continuous instance if your program growing it gradually. (This is exactly how I built my Emacs web server with elisp.) You can test your code as soon as soon as it's written.

The setup can get even more advanced. The Common Lisp REPL need not be running on the same computer. It can be running on another computer, as long as SLIME is able to connect to it over the network. Several developers could even share a single Common Lisp process running on a common machine. Lots of possibilities.

If you don't have a Common Lisp book yet, there's Practical Common Lisp, which you can read at no cost online or download for reading offline. It's based on an Emacs and SLIME setup, so you'll be right on track.


Don't stop here! This isn't everything. Check out the archives (on the left) for more posts. Or just have a look at the index.