At least it is for me. This past week at work I've been furiously
rushing work on a project written in Java. It's completely my fault
that I'm using Java since I'm the one who picked the language. I
wanted to use Java the platform, but I don't know any other JVM
languages well enough to use them in place of Java the language for a
project at work.
It's all sorts of little things that make writing Java so exhausting
for me. At the end of the day it I just feel irritated. I hate the
absolute lack of functional programming and that I have to specify
everything at such a low level. The whole reason we program in
high-level languages is so we can express algorithms more concisely,
but Java fails at this.
Here's an example of what I'm talking about, something I basically did
a few times today. Let's say you have an array of
floats, nums and you want to sum them and return the
result (or maybe use it in another expression). In Lisp it's very
straightforward.
(reduce '+ nums)
"Reduce the sequence nums by addition." Notice that it's
more about saying what I want to do rather than how to
do it. I don't have to introduce any temporary storage or
iterators. To do the same thing in Java it will look something like
this.
intsum = 0;
for (doublenum : nums) {
sum += num;
}
return sum;
If you're using an older Java without the enhanced looping construct
it gets even uglier. I had to introduce a variable for accumulation
and a second variable for iteration. This sort of thing has to be
done all over the place in Java, and it greatly increases the
cognitive overload when reading Java code.
This instruction is more about telling the computer the how
rather than my overall intention. One problem with telling it
the how is that I've unnecessarily locked in a specific
algorithm and ordering. The literal instruction says that the
numbers must be added in order, sequentially. My Lisp
instruction doesn't do that.
It gets even worse when you complicate it slightly by adding a second
array and, say, multiplying it pairwise with the first.
for (inti = 0; i < nums1.length; i++) {
sum += nums1[i] * nums2[i];
}
return sum;
Now the loop gets more complex. I have to tell it how to
increment the iterator. I have to tell it to check the bounds of the
array. The iterator is a misdirection because the actual number stored
in it isn't what's important. Again, the Lisp method is much more
concise.
(reduce '+ (map 'list '* nums1 nums2))
"Map the two sequences by multiplication into a list, then reduce it
by addition." Unfortunately we start to leak a little bit into
the how here. I am telling it that the intermediate structure
should be a list, because map forces me to pick a
representation. Besides that, I am only describing my overall
intention and not the obvious details.
So with Java my days become filled with the tedious low-level
algorithm descriptions that I have to hammer out over and over and
over. Death by a thousand paper cuts.
Lisp isn't the only language that has a (generally) much better
approach; it's just my favorite. :-) Most languages with at least some
decent functional facilities will also do the above concisely.
I got an Elisp idea today and even went as far as implementing a proof
of concept for it: distributed computing with Emacs Lisp. As usual for
me the idea takes advantage of Lisp features to make the task pretty
simple, very specifically Elisp's implementation. In this case it's
the Lisp reader, printer, and the fact that Elisp functions have a
printed representation, both byte-compiled and not.
A central server listens for TCP connections. Clients offering their
CPU for use connect to the server and await instructions. The server
sends a single, no-argument, anonymous function to the client. The
client calls the function, returning the resulting form back to the
server. In order to transmit the function it's encoded into a string
using the Lisp printer, and the client turns it back into an
executable function with the Lisp reader.
For some simple security there is a shared password between the client
and server. When the server sends a function it includes a signature,
and the client only runs code that matches the signature. To create a
signature the string encoded version of the function is appended with
the password (both strings) and hashed with a secure hashing
algorithm. Only someone who knows the password -- including other
clients -- can create the signature.
(defunsign-sexp (password sexp)
"Return signature of the given s-exp."
(sha1 (format "%s%s" password sexp))
To make it easy for the client to read in both the signature and the
function we just cons them together before encoding them as text.
(defunencode (password sexp)
"Encode a s-exp for transmission to client."
(prin1-to-string (cons (sign-sexp password sexp) sexp)))
The client calls the Lisp reader on the string, then checks the
signature in the car cell against the s-expression in
the cdr cell. This will return the function if it's
legitimate, otherwise nil.
(defundecode (password str)
"Decode string into s-exp, checking the signature in the process."
(let* ((cons (read str))
(sig (car cons))
(sexp (cdr cons)))
(if (equal sig (sign-sexp password sexp))
sexp
nil)))
And that's the core of it. It just needs some network code to move the
string between computers. That part can be found in the linked source
above.
To demo this, I'll use the whiten function from
my previous post. I'll run it with
three different strings on three different computers. Assume we
started the dist-emacs server (dist-start) and connected
three clients (dist-connect) from three computers to
it. The clients were fired up from scratch so there's
no whiten function on them yet, but there is one
defined on the server. First we'll send the function definition to the
clients. The dist-dist function takes a list of functions
and passes each one to a client. Ideally I'd want this function to be
more intelligent, managing a work queue so that an arbitrary length
list of functions will be fed one at a time to each client. That's not
the case here.
Also like in the previous post, this is an abstraction leak with the
Emacs implementation. But I like this trick so I'm going to use it
anyway. :-) Next we call it on each client with a different string.
The way I have it set up for my proof of concept the results are just
spit back into the server's *Messages* buffer. If we
watch that buffer we can see each results come back in one at a time
as each machine finishes. I can watch Emacs saturate the CPU on every
client machine simultaneously as it works.
This isn't the same order as the clients, but the order in which the
jobs were completed.
As for the practicality, I doubt there really is one. It's really only
a neat concept (or maybe not even that). For almost the exact same
reasons as my distributed JavaScript
idea, this is a solution looking for a problem. The problem needs to
be able to be broken into small computation units, because Emacs has
no threading, and it has to be low bandwidth, because it has to be
parsed all at once from a string. If you want to pass large data sets
it needs to be done out-of-band, which probably defeats the
purpose. There seem to be few to no problems that fit these
limitations.
Memoization is something I think
should be packaged as a standard function for just about every
language. That's not generally the case, but luckily this is easy to
fix in Lisps. I needed memoization recently for an Elisp project I'm
working on. I could have hand-written one but a generic memoization
function would have worked just fine. Since I didn't find any generic
Elisp memoization on-line I wrote my own.
Just put it in your path
and (require
'memoize) it. Here's the core
function.
;; ID: 83bae208-da65-3e26-2ecb-4941fb310848
(defunmemoize-wrap (func)
"Return the memoized version of the given function."
(let ((table-sym (gensym))
(val-sym (gensym))
(args-sym (gensym)))
(set table-sym (make-hash-table :test 'equal))
`(lambda (&rest ,args-sym)
,(concat (documentation func) "\n(memoized function)")
(let ((,val-sym (gethash ,args-sym ,table-sym)))
(if ,val-sym
,val-sym
(puthash ,args-sym (apply ,func ,args-sym) ,table-sym))))))
Normally the hash table would be stored inside of a closure, but Elisp
doesn't have closures. Instead, the hash table is stored in an
uninterned symbol, which will only (generally) be accessible to the
memoization wrapper (in order for external code to access the table it
would need to directly inspect the memoized function object). It's not
quite a scope but it's close enough.
Note that it keeps the original function documentation intact. I want
the memoization wrapper to be an unobtrusive as possible.
This is a bit of an abstraction leak. It's creating new code at run
time taking advantage of the specific low-level way Emacs executes
s-expressions. If Emacs ever stopped directly evaluating s-expressions
then this code would break. A more proper way to do this would
be at compile time with a macro, where this exact sort of code
generation is expected. Rather than defun we would use a
macro called something like defmemoized.
Here's a demo of it in action. This whiten function is
computationally expensive: it performs key whitening. It repeats a
hash function thousands of times to produce an expensive value. This
isn't something you generally want to memoize, but stick with me.
(defunwhiten (key)
"Perform key whitening with the md5 hash function."
(dotimes (i 100000 key)
(setq key (md5 key))))
(whiten "password") ; takes a couple of seconds
On my laptop that takes a couple of seconds to run. Increase that
counter if it's quick on your computer. My memoize package provides
a memoize function which will create a new function that
wraps the original, then installs the new function in place of the old
one if we give it the function symbol.
(memoize 'whiten)
The first time you run it after memoization it will be slow, but after
that the memoization kicks in for a quick return.
There are two Elisp specific issues at hand. First is that memoizing
an interactive function will produce a non-interactive function. It
would be easy to fix this problem when it comes to non-byte-compiled
functions, but recovering the interactive definition from a
byte-compiled function is more complex than I care to deal
with. Besides, interactive functions are always used for their side
effects so there's no reason to memoize them.
Second is a limitation of Elisp hash tables. There's no way to
distinguish a nil value and no value. The hash table returns nil for
both. This means you cannot memoize nil returns. But a computationally
expensive function shouldn't be returning nil anyway.
Hopefully someone else will find good use for this as well.
As I get more involved with tabletop RPGs, specifically Dungeons and
Dragons, I find there are some related attributes that I wish these
game systems had. While I'm sure there are systems do have some of
these, I wish whatever I happen to be using had all of them.
Print friendly. The source material tends to be very colorful
and graphical. While this can be a good thing, especially when
illustrating monsters (Show, not tell!), it's bad if you want to print
out your own materials. I want the crucial information available in a
crisp, clean monochrome form of some sort. Not only could I reproduce
material for use in notes and handouts, but I could create my own
condensed sets of information by composing these crisp forms.
For example, in the D&D monster manual each monster has a nice
concise block containing all the information -- defenses, health,
abilities, etc -- needed to use that monster. This is great, but it's
on a brownish background, in a red-ish box. So close to being what I
want. But even then, do I have legal permission to reproduce this
information? And so ...
Licensing. The closest thing tabletop gaming has to a Free
Software license would be
the Open
Game License (OGL), which is still pretty restrictive. I would
love for the source materials to be licensed at least loosely enough
that I could print out my own copies for cheap (assuming they are
print friendly, per above). Have some new players sitting down at the
table? To get them started, give them that stapled-together player
handbook you printed out. There's RPG evangelism for you.
The Fudge role-playing
game system has both these attributes down pretty well. The Fudge
manual is very print friendly PDF with explicit permission to share it
with your friends. However, just
as yacc is a compiler
compiler, Fudge is really a game system system, a system for
creating game systems, so it's only part of what is needed to play a
game.
Useful software tools. One specific example is character
creation software. Creating a new character can be burdensome,
especially for a new player. Software that allows a player to select
some basic options from a menu and produce a printable, error-free
character sheet can save a lot of time.
Fourth edition Dungeons and Dragons has a character builder, but it is
a humongous piece of junk. It's proprietary, Windows-only, bulky, and
slow. For a program that merely generates printouts based on a few
user selections from some simple menus, it has some extremely
excessive system requirements (much higher than the ones they
claim). And it requires a reboot to install too. A human can produce
the same results by hand inside of a half hour, so for a computer
there is virtually no computation involved. So what is it doing? Worse
of all, the fourth edition license expressly forbids competing
character creation software, so no one can legally produce a
reasonable one. All this thing should be is a database of available
character abilities, some character sheet logic, and a postscript
printer.
I've been looking for a nearby tabletop gaming group for awhile now. I
asked people I knew. I asked around at work. I just couldn't find
anyone. Luckily, a new thing that Wizards of the Coast has been doing
is
D&D Encounters where prepared adventures are run every week
openly at local gaming stores. The purpose is for casual players or
beginners to be able to freely to play some D&D without needing
any commitment, preparation, or equipment. Characters are
pre-generated, so no spending a half hour creating some new player
characters every week.
I hopped into it
for
season two, which just started 6 weeks ago. Each weekly session is
a 1.5 to 2 hour combat encounter. I've been having fun, but honestly
it's not all that exciting compared to what a real campaign can
bring. There is no role-playing, practically no NPC interaction, no
puzzles, and no exploration. It also doesn't help that the adventures
and
characters are riddled with mistakes and very unbalanced. For an
example of unbalanced, the character I've been playing, Barcan (or
Barqan depending on where you are in the character sheet), could
be killed -- and I'm not talking about unconscious dying but
negative bloodied value dead -- by a monster critical strike in just
about every encounter so far. Every time a monster attacked me there
was a 1 in 20 chance, even at full health, that I might be done
playing for the week.
But the great part is that it got me connected to other players in my
area, which I think is the most valuable part of Encounters. One of my
fellow players was just starting a regular gaming group and invited me
to come along, so we've been playing on weekends now, with the
intention of taking turns as the DM among those who are
interested. And for a little irony, everyone except one person in the
group also works at the lab. I guess I didn't ask around enough!
So I'm going to be DMing a 4e Dungeons and Dragons campaign sometime
in the near future, and I'm quite excited about it!
Anyway, what does that have to do with drawing
a space
elevator in the GIMP? First of all, if you're one of the
players in my group who found their way here stop reading
now. You'll find out all this in the first session, so come back
after that. So, I've had this campaign idea in my head for a couple of
years now, and it involves a skyhook of sorts constructed by a
combination of careful engineering and powerful arcane magic. It
leads somewhere, not space, but somewhere. Since that somewhere
is part of the mystery I won't reveal where that is, but if the
campaign goes well I'll write about it more in the future.
When I run the first session I want to illustrate the skyhook to the
players. Show, not tell they say. I like some of the work
people
do
imitating Bob Ross in the GIMP. I've done a few of these to try
it out, and it's surprising how well it can turn out even from a
beginner. I employed this new art education to draw my skyhook for my
game. The GIMP undo history reveals how I did it,
In retrospect I should have drawn the skyhook right after I finished
those first clouds, since it's behind everything else in the
scene. Oh, and that thing on the bottom left is a twisted scar left
behind from a previous attempt at building the skyhook, but it
collapsed. It's a dangerous place to be.
A feature unique to some Lisps is the ability to compile functions
individually at any time. This could be to a bytecode or native code,
depending on the dialect and implementation. In a Lisp implementations
where compilation matters (such as
CLISP), there are typically two forms in which code can be
evaluated: a slower, unoptimized uncompiled form and a fast, efficient
compiled form. The uncompiled form would have some sort of advantage,
even if it's merely not having to spend time on compilation.
In Emacs Lisp, the uncompiled form of a function is just a lambda
s-expression. The only thing that gives it a name is the symbol it's
stored in. The compiled form is a (special) vector, with the actual
byte codes stored in a string as the second element. Constants, the
docstring, and other things are stored in this function vector as
well. The Elisp function to compile functions
is byte-compile. It can be given a lambda function or a
symbol. In the case of a symbol, the compiled function is installed
over top of the s-expression form.
The compiler will not only convert the function to bytecode and expand
macros, but also perform optimizations such as removing dead code,
evaluating safe constant forms, and inline functions. This provides a
nice performance boost (testing using my
measure-time macro),
(defunfib (n)
"Fibonacci sequence."
(if (<= n 2) 1
(+ (fib (- n 1)) (fib (- n 2)))))
(measure-time
(fib 30))
=> 1.0508708953857422
(byte-compile 'fib)
(measure-time
(fib 30))
=> 0.4302399158477783
Most of the installed functions in a typical Emacs instance are
already compiled, since they are loaded already compiled. But a number
of them aren't compiled. So, I thought, why not spend a few
seconds to do this?
In Common Lisp, there is a predicate for testing whether a function
has been compiled or not: compiled-function-p. For
whatever reason, there is no equivalent predefined in Elisp, so I
wrote one,
(defunbyte-compiled-p (func)
"Return t if function is byte compiled."
(cond
((symbolp func) (byte-compiled-p (symbol-function func)))
((functionp func) (not (sequencep func)))
(t nil)))
My idea was to iterate over every interned symbol and, if the function
slot contains an uncompiled function, using the test above, I would
call byte-compile on it. Well, it turns out
that byte-compile is very flexible and will ignore
symbols with no function and symbols with already compiled functions.
So next, how do we iterate over every interned symbol? There is
a mapatoms function for this. Provide it a function and
it calls it on every interned symbol. Well, that's simple and
anticlimactic.
(mapatoms 'byte-compile)
That's it! It will take only a few seconds and spew a lot of
warnings. I haven't found a way to disable those warnings, so this
isn't something you'd want to have run automatically, unless you like
having an extra window thrown in your face. I've only discovered this
recently, so I'm not sure what sort of bad things this may do to your
Emacs session. Not every function was written with compilation in
mind. There are interactions with macros to consider.
I doubt there will be a noticeable performance difference. Like I said
before, most everything is already compiled, and those are the
functions that get used the most. There's just something nice about
knowing all your functions are compiled and optimized.
ParEdit is a
powerful extension to Emacs that I've just begun using recently. It's
a minor mode that forces all parenthesis, square brackets, and quotes
to be balanced at all times. While it's useful for any programming
language it's especially suited for Lisps, because it's designed for
manipulating nested parenthesis -- i.e. s-expressions. It's not
currently part of Emacs so you have to drop the script in
your load-path somewhere.
I've frequently thought that a Lisp-based shell would be an
interesting and powerful tool, much like a normal Lisp REPL. Programs
would be treated like Lisp functions. For example,
But typing all those parenthesis all the time would be quite the
nuisance. I know this from experience typing at Lisp REPLs. I imagined
something that works exactly like ParEdit would be needed to make all
that work go away. To save even more time each prompt would begin with
a nested pair, with the cursor placed between them. Then typing a
quick command is no different than a normal shell.
wellons@luna:~$ ()
Well, in Emacs we have both ParEdit and REPLs, so we can compose these
features together with just a little advice. Here's how to do it with
the Interactive Emacs-Lisp Mode (IELM) REPL. First tell IELM to use
ParEdit,
The function in IELM that spits out the next prompt
is ielm-eval-input, so we give it the advice to call the
ParEdit function afterwards to insert a parenthesis pair.
(defadviceielm-eval-input (after ielm-paredit activate)
"Begin each IELM prompt with a ParEdit parenthesis pair."
(paredit-open-round))
And that's it! Note that the first IELM prompt is not placed by this
function so it won't appear until the second prompt.
*** Welcome to IELM *** Type (describe-mode) for help.ELISP> ELISP> ()
If you want to enter a single atom and don't need parenthesis, just
hit backspace once. This is much less common so it gets the extra
keystroke.
This can be done for inferior-lisp
and SLIME to
enhance those REPLs as well. You just have to figure out which defun
to advise.
With this hash tables can be printed and read as part of normal
s-expressions with the standard lisp reader and printer functions. It
seems heavy, having to write out "hash-table" in there,
but I think it's because the #s notation will be used to
create printed forms of other lisp objects that currently do not have
one.
At work I spend about a third of my time
doing data
reduction, and it's one of my favorite things to do at the
laboratory (I've done
it on my own too). Data come in from various organizations and
sponsors in all sorts of strange formats. We have a bunch of fancy
analysis tools to work on the data, but they aren't any good if they
can't read the format. So I'm tasked with writing tools to convert
incoming data into a more useful format.
If the source file is a text-based file it's usually just a matter of
writing a parser -- possibly including a grammar -- after carefully
studying the textual structure. Binary files are
trickier. Fortunately, there are a few tools that come in handy for
identifying the format of a strange binary file.
The first is the standard utility found on any unix-like
system: file. I have no idea if it has an official website because it's a term
that's impossible to search for. It tries to identify a file based on
the magic numbers and other tests, none based on the actual
file name. I've never been to lucky to have file recognize
a strange format at work. But silence speaks volumes: it means the
data are not packed into something common, like a simple zip archive.
Next, I take a look at the file
with ent, a pseudo-random
number sequence test program. This will reveal how compressed (or even
encrypted) data are. If ent says the data are very dense, say 7 bits
per byte or more, the format is employing a good compression
algorithm. The next step would be tackling that so I can start over on
the uncompressed contents. If it's something like 4 bits per byte
there's no compression. If it's in between then it might be employing
a weak, custom compression algorithm. I've always seen the latter two.
Next I dive in with a hex editor. I use a combination of
Emacs' hexl-mode and the standard BSD
tool hexdump (for
something more static). One of the first things I like to identify is
byte order, and in a hex dump it's often obvious.
In general, better designed formats use big endian, also known as
network order. That's the standard ordering used in communication,
regardless of the native byte ordering of the network clients. The
amateur, home-brew formats are generally less thoughtful and dump out
whatever the native format is, usually little endian because that's
what x86 is. Worse, they'll also generate data on architectures that
are big endian, so you can get it both ways without any warning. In
that case your conversion tool has to be sensitive to byte order and
find some way to identify which ordering a file is using. A time-stamp
field is very useful here, because a 64-bit time-stamp read with the
wrong byte order will give a very unreasonable date.
For example, here's something I see often.
eb 03 00 00 35 00 00 00 66 1e 00 00
That's most likely 3 4-byte values, in little endian byte order. The
zeros make the integers stand out.
eb 03 00 00 35 00 00 00 66 1e 00 00
We can tell it's little endian because the non-zero digits are on the
left. This information will be useful in identifying more bytes in the
file.
Next I'd look for headers, common strings of bytes, so that I can
identify larger structures in the data. I've never had to reverse
engineer a format ... yet. I'm not sure if I could. Once I got this
far I've always been able to research the format further and find
either source code or documentation, revealing everything to me.
If the file contains strings I'll dump them out
with strings. I haven't found this too useful at work, but
it's been useful at
home.
And there's something still useful beyond these. Something I made
myself at home for a completely different purpose, but I've exploited
its side effects:
my PNG
Archiver. The original purpose of the tool is to store a file in
an image, as images are easier to share with others. The side effect
is that by viewing the image I get to see the structure of the
file. For example, here's my laptop's /bin/ls, very
roughly labeled.
It's easy to spot the different segments of the ELF format. Higher
entropy sections are more brightly colored. Strings, being composed of
ASCII-like text, have their MSB's unset, which is why they're
darker. Any non-compressed format will have an interesting profile
like this. Here's a Word doc, an infamously horrible format,
And here's some Emacs bytecode. You can tell the code vectors apart
from the constants section below it.
If you find yourself having to inspect strange files, keep these tools
around to make the job easier.
Don't stop here! This isn't everything. Check out the archives
(on the left) for more posts. Or just have a look at
the index.