I recently discovered a very clever tool
called Middleman. It's
a quick way to set up and manage multiple-process workload queue. The
process output and display is done inside of a screen session, so if
it's going to take awhile you can just detach and check on it again
later. In the past I used make's -j option
to do this, but that's always a pain to set up.
It is composed of three
programs: mdm.screen, mdm-run,
and mdm-sync. The first is the top level supervisor that
you use to launch the enhanced shell script. The second prefixes every
command to be run in parallel. The third is prefixes the final command
that depends on all of the individual processes.
The linked Middleman page has a good example, but I'll share my own
anyway. I used it over the weekend to download a long series of videos
with
youtube-dl. Because the transfer rate for a single video is
throttled I wanted to grab several at a time, but I also didn't want
to grab them all at the same time. Here's the dumb version of
the script, download.sh, that does them all at once.
Then just launch the script with mdm.screen. It defaults
to 6 processes at a time, but you can adjust it to whatever you want
with the -n switch. I used 4.
$ mdm.screen -n 4 ./download.sh
There is a screen window that lists the process queue and highlights
the currently active jobs. I could switch between screen windows to
see the output from individual processes and see how they were doing.
From the perspective of the shell script, the first four commands
finish instantly but fifth command will block. As soon as Middleman
sees one of the first four processes complete the fifth one will begin
work, returning control to the shell script, and the sixth command
will block, since the queue is full again.
I'm sure I'll be using this more in the future, especially for tasks
like batch audio and video encoding. I bet this could be useful on
a cluster.
At least it is for me. This past week at work I've been furiously
rushing work on a project written in Java. It's completely my fault
that I'm using Java since I'm the one who picked the language. I
wanted to use Java the platform, but I don't know any other JVM
languages well enough to use them in place of Java the language for a
project at work.
It's all sorts of little things that make writing Java so exhausting
for me. At the end of the day it I just feel irritated. I hate the
absolute lack of functional programming and that I have to specify
everything at such a low level. The whole reason we program in
high-level languages is so we can express algorithms more concisely,
but Java fails at this.
Here's an example of what I'm talking about, something I basically did
a few times today. Let's say you have an array of
floats, nums and you want to sum them and return the
result (or maybe use it in another expression). In Lisp it's very
straightforward.
(reduce '+ nums)
"Reduce the sequence nums by addition." Notice that it's
more about saying what I want to do rather than how to
do it. I don't have to introduce any temporary storage or
iterators. To do the same thing in Java it will look something like
this.
intsum = 0;
for (doublenum : nums) {
sum += num;
}
return sum;
If you're using an older Java without the enhanced looping construct
it gets even uglier. I had to introduce a variable for accumulation
and a second variable for iteration. This sort of thing has to be
done all over the place in Java, and it greatly increases the
cognitive overload when reading Java code.
This instruction is more about telling the computer the how
rather than my overall intention. One problem with telling it
the how is that I've unnecessarily locked in a specific
algorithm and ordering. The literal instruction says that the
numbers must be added in order, sequentially. My Lisp
instruction doesn't do that.
It gets even worse when you complicate it slightly by adding a second
array and, say, multiplying it pairwise with the first.
for (inti = 0; i < nums1.length; i++) {
sum += nums1[i] * nums2[i];
}
return sum;
Now the loop gets more complex. I have to tell it how to
increment the iterator. I have to tell it to check the bounds of the
array. The iterator is a misdirection because the actual number stored
in it isn't what's important. Again, the Lisp method is much more
concise.
(reduce '+ (map 'list '* nums1 nums2))
"Map the two sequences by multiplication into a list, then reduce it
by addition." Unfortunately we start to leak a little bit into
the how here. I am telling it that the intermediate structure
should be a list, because map forces me to pick a
representation. Besides that, I am only describing my overall
intention and not the obvious details.
So with Java my days become filled with the tedious low-level
algorithm descriptions that I have to hammer out over and over and
over. Death by a thousand paper cuts.
Lisp isn't the only language that has a (generally) much better
approach; it's just my favorite. :-) Most languages with at least some
decent functional facilities will also do the above concisely.
I got an Elisp idea today and even went as far as implementing a proof
of concept for it: distributed computing with Emacs Lisp. As usual for
me the idea takes advantage of Lisp features to make the task pretty
simple, very specifically Elisp's implementation. In this case it's
the Lisp reader, printer, and the fact that Elisp functions have a
printed representation, both byte-compiled and not.
A central server listens for TCP connections. Clients offering their
CPU for use connect to the server and await instructions. The server
sends a single, no-argument, anonymous function to the client. The
client calls the function, returning the resulting form back to the
server. In order to transmit the function it's encoded into a string
using the Lisp printer, and the client turns it back into an
executable function with the Lisp reader.
For some simple security there is a shared password between the client
and server. When the server sends a function it includes a signature,
and the client only runs code that matches the signature. To create a
signature the string encoded version of the function is appended with
the password (both strings) and hashed with a secure hashing
algorithm. Only someone who knows the password -- including other
clients -- can create the signature.
(defunsign-sexp (password sexp)
"Return signature of the given s-exp."
(sha1 (format "%s%s" password sexp))
To make it easy for the client to read in both the signature and the
function we just cons them together before encoding them as text.
(defunencode (password sexp)
"Encode a s-exp for transmission to client."
(prin1-to-string (cons (sign-sexp password sexp) sexp)))
The client calls the Lisp reader on the string, then checks the
signature in the car cell against the s-expression in
the cdr cell. This will return the function if it's
legitimate, otherwise nil.
(defundecode (password str)
"Decode string into s-exp, checking the signature in the process."
(let* ((cons (read str))
(sig (car cons))
(sexp (cdr cons)))
(if (equal sig (sign-sexp password sexp))
sexp
nil)))
And that's the core of it. It just needs some network code to move the
string between computers. That part can be found in the linked source
above.
To demo this, I'll use the whiten function from
my previous post. I'll run it with
three different strings on three different computers. Assume we
started the dist-emacs server (dist-start) and connected
three clients (dist-connect) from three computers to
it. The clients were fired up from scratch so there's
no whiten function on them yet, but there is one
defined on the server. First we'll send the function definition to the
clients. The dist-dist function takes a list of functions
and passes each one to a client. Ideally I'd want this function to be
more intelligent, managing a work queue so that an arbitrary length
list of functions will be fed one at a time to each client. That's not
the case here.
Also like in the previous post, this is an abstraction leak with the
Emacs implementation. But I like this trick so I'm going to use it
anyway. :-) Next we call it on each client with a different string.
The way I have it set up for my proof of concept the results are just
spit back into the server's *Messages* buffer. If we
watch that buffer we can see each results come back in one at a time
as each machine finishes. I can watch Emacs saturate the CPU on every
client machine simultaneously as it works.
This isn't the same order as the clients, but the order in which the
jobs were completed.
As for the practicality, I doubt there really is one. It's really only
a neat concept (or maybe not even that). For almost the exact same
reasons as my distributed JavaScript
idea, this is a solution looking for a problem. The problem needs to
be able to be broken into small computation units, because Emacs has
no threading, and it has to be low bandwidth, because it has to be
parsed all at once from a string. If you want to pass large data sets
it needs to be done out-of-band, which probably defeats the
purpose. There seem to be few to no problems that fit these
limitations.
Memoization is something I think
should be packaged as a standard function for just about every
language. That's not generally the case, but luckily this is easy to
fix in Lisps. I needed memoization recently for an Elisp project I'm
working on. I could have hand-written one but a generic memoization
function would have worked just fine. Since I didn't find any generic
Elisp memoization on-line I wrote my own.
Just put it in your path
and (require
'memoize) it. Here's the core
function.
;; ID: 83bae208-da65-3e26-2ecb-4941fb310848
(defunmemoize-wrap (func)
"Return the memoized version of the given function."
(let ((table-sym (gensym))
(val-sym (gensym))
(args-sym (gensym)))
(set table-sym (make-hash-table :test 'equal))
`(lambda (&rest ,args-sym)
,(concat (documentation func) "\n(memoized function)")
(let ((,val-sym (gethash ,args-sym ,table-sym)))
(if ,val-sym
,val-sym
(puthash ,args-sym (apply ,func ,args-sym) ,table-sym))))))
Normally the hash table would be stored inside of a closure, but Elisp
doesn't have closures. Instead, the hash table is stored in an
uninterned symbol, which will only (generally) be accessible to the
memoization wrapper (in order for external code to access the table it
would need to directly inspect the memoized function object). It's not
quite a scope but it's close enough.
Note that it keeps the original function documentation intact. I want
the memoization wrapper to be an unobtrusive as possible.
This is a bit of an abstraction leak. It's creating new code at run
time taking advantage of the specific low-level way Emacs executes
s-expressions. If Emacs ever stopped directly evaluating s-expressions
then this code would break. A more proper way to do this would
be at compile time with a macro, where this exact sort of code
generation is expected. Rather than defun we would use a
macro called something like defmemoized.
Here's a demo of it in action. This whiten function is
computationally expensive: it performs key whitening. It repeats a
hash function thousands of times to produce an expensive value. This
isn't something you generally want to memoize, but stick with me.
(defunwhiten (key)
"Perform key whitening with the md5 hash function."
(dotimes (i 100000 key)
(setq key (md5 key))))
(whiten "password") ; takes a couple of seconds
On my laptop that takes a couple of seconds to run. Increase that
counter if it's quick on your computer. My memoize package provides
a memoize function which will create a new function that
wraps the original, then installs the new function in place of the old
one if we give it the function symbol.
(memoize 'whiten)
The first time you run it after memoization it will be slow, but after
that the memoization kicks in for a quick return.
There are two Elisp specific issues at hand. First is that memoizing
an interactive function will produce a non-interactive function. It
would be easy to fix this problem when it comes to non-byte-compiled
functions, but recovering the interactive definition from a
byte-compiled function is more complex than I care to deal
with. Besides, interactive functions are always used for their side
effects so there's no reason to memoize them.
Second is a limitation of Elisp hash tables. There's no way to
distinguish a nil value and no value. The hash table returns nil for
both. This means you cannot memoize nil returns. But a computationally
expensive function shouldn't be returning nil anyway.
Hopefully someone else will find good use for this as well.
As I get more involved with tabletop RPGs, specifically Dungeons and
Dragons, I find there are some related attributes that I wish these
game systems had. While I'm sure there are systems do have some of
these, I wish whatever I happen to be using had all of them.
Print friendly. The source material tends to be very colorful
and graphical. While this can be a good thing, especially when
illustrating monsters (Show, not tell!), it's bad if you want to print
out your own materials. I want the crucial information available in a
crisp, clean monochrome form of some sort. Not only could I reproduce
material for use in notes and handouts, but I could create my own
condensed sets of information by composing these crisp forms.
For example, in the D&D monster manual each monster has a nice
concise block containing all the information -- defenses, health,
abilities, etc -- needed to use that monster. This is great, but it's
on a brownish background, in a red-ish box. So close to being what I
want. But even then, do I have legal permission to reproduce this
information? And so ...
Licensing. The closest thing tabletop gaming has to a Free
Software license would be
the Open
Game License (OGL), which is still pretty restrictive. I would
love for the source materials to be licensed at least loosely enough
that I could print out my own copies for cheap (assuming they are
print friendly, per above). Have some new players sitting down at the
table? To get them started, give them that stapled-together player
handbook you printed out. There's RPG evangelism for you.
The Fudge role-playing
game system has both these attributes down pretty well. The Fudge
manual is very print friendly PDF with explicit permission to share it
with your friends. However, just
as yacc is a compiler
compiler, Fudge is really a game system system, a system for
creating game systems, so it's only part of what is needed to play a
game.
Useful software tools. One specific example is character
creation software. Creating a new character can be burdensome,
especially for a new player. Software that allows a player to select
some basic options from a menu and produce a printable, error-free
character sheet can save a lot of time.
Fourth edition Dungeons and Dragons has a character builder, but it is
a humongous piece of junk. It's proprietary, Windows-only, bulky, and
slow. For a program that merely generates printouts based on a few
user selections from some simple menus, it has some extremely
excessive system requirements (much higher than the ones they
claim). And it requires a reboot to install too. A human can produce
the same results by hand inside of a half hour, so for a computer
there is virtually no computation involved. So what is it doing? Worse
of all, the fourth edition license expressly forbids competing
character creation software, so no one can legally produce a
reasonable one. All this thing should be is a database of available
character abilities, some character sheet logic, and a postscript
printer.
I've been looking for a nearby tabletop gaming group for awhile now. I
asked people I knew. I asked around at work. I just couldn't find
anyone. Luckily, a new thing that Wizards of the Coast has been doing
is
D&D Encounters where prepared adventures are run every week
openly at local gaming stores. The purpose is for casual players or
beginners to be able to freely to play some D&D without needing
any commitment, preparation, or equipment. Characters are
pre-generated, so no spending a half hour creating some new player
characters every week.
I hopped into it
for
season two, which just started 6 weeks ago. Each weekly session is
a 1.5 to 2 hour combat encounter. I've been having fun, but honestly
it's not all that exciting compared to what a real campaign can
bring. There is no role-playing, practically no NPC interaction, no
puzzles, and no exploration. It also doesn't help that the adventures
and
characters are riddled with mistakes and very unbalanced. For an
example of unbalanced, the character I've been playing, Barcan (or
Barqan depending on where you are in the character sheet), could
be killed -- and I'm not talking about unconscious dying but
negative bloodied value dead -- by a monster critical strike in just
about every encounter so far. Every time a monster attacked me there
was a 1 in 20 chance, even at full health, that I might be done
playing for the week.
But the great part is that it got me connected to other players in my
area, which I think is the most valuable part of Encounters. One of my
fellow players was just starting a regular gaming group and invited me
to come along, so we've been playing on weekends now, with the
intention of taking turns as the DM among those who are
interested. And for a little irony, everyone except one person in the
group also works at the lab. I guess I didn't ask around enough!
So I'm going to be DMing a 4e Dungeons and Dragons campaign sometime
in the near future, and I'm quite excited about it!
Anyway, what does that have to do with drawing
a space
elevator in the GIMP? First of all, if you're one of the
players in my group who found their way here stop reading
now. You'll find out all this in the first session, so come back
after that. So, I've had this campaign idea in my head for a couple of
years now, and it involves a skyhook of sorts constructed by a
combination of careful engineering and powerful arcane magic. It
leads somewhere, not space, but somewhere. Since that somewhere
is part of the mystery I won't reveal where that is, but if the
campaign goes well I'll write about it more in the future.
When I run the first session I want to illustrate the skyhook to the
players. Show, not tell they say. I like some of the work
people
do
imitating Bob Ross in the GIMP. I've done a few of these to try
it out, and it's surprising how well it can turn out even from a
beginner. I employed this new art education to draw my skyhook for my
game. The GIMP undo history reveals how I did it,
In retrospect I should have drawn the skyhook right after I finished
those first clouds, since it's behind everything else in the
scene. Oh, and that thing on the bottom left is a twisted scar left
behind from a previous attempt at building the skyhook, but it
collapsed. It's a dangerous place to be.
A feature unique to some Lisps is the ability to compile functions
individually at any time. This could be to a bytecode or native code,
depending on the dialect and implementation. In a Lisp implementations
where compilation matters (such as
CLISP), there are typically two forms in which code can be
evaluated: a slower, unoptimized uncompiled form and a fast, efficient
compiled form. The uncompiled form would have some sort of advantage,
even if it's merely not having to spend time on compilation.
In Emacs Lisp, the uncompiled form of a function is just a lambda
s-expression. The only thing that gives it a name is the symbol it's
stored in. The compiled form is a (special) vector, with the actual
byte codes stored in a string as the second element. Constants, the
docstring, and other things are stored in this function vector as
well. The Elisp function to compile functions
is byte-compile. It can be given a lambda function or a
symbol. In the case of a symbol, the compiled function is installed
over top of the s-expression form.
The compiler will not only convert the function to bytecode and expand
macros, but also perform optimizations such as removing dead code,
evaluating safe constant forms, and inline functions. This provides a
nice performance boost (testing using my
measure-time macro),
(defunfib (n)
"Fibonacci sequence."
(if (<= n 2) 1
(+ (fib (- n 1)) (fib (- n 2)))))
(measure-time
(fib 30))
=> 1.0508708953857422
(byte-compile 'fib)
(measure-time
(fib 30))
=> 0.4302399158477783
Most of the installed functions in a typical Emacs instance are
already compiled, since they are loaded already compiled. But a number
of them aren't compiled. So, I thought, why not spend a few
seconds to do this?
In Common Lisp, there is a predicate for testing whether a function
has been compiled or not: compiled-function-p. For
whatever reason, there is no equivalent predefined in Elisp, so I
wrote one,
(defunbyte-compiled-p (func)
"Return t if function is byte compiled."
(cond
((symbolp func) (byte-compiled-p (symbol-function func)))
((functionp func) (not (sequencep func)))
(t nil)))
My idea was to iterate over every interned symbol and, if the function
slot contains an uncompiled function, using the test above, I would
call byte-compile on it. Well, it turns out
that byte-compile is very flexible and will ignore
symbols with no function and symbols with already compiled functions.
So next, how do we iterate over every interned symbol? There is
a mapatoms function for this. Provide it a function and
it calls it on every interned symbol. Well, that's simple and
anticlimactic.
(mapatoms 'byte-compile)
That's it! It will take only a few seconds and spew a lot of
warnings. I haven't found a way to disable those warnings, so this
isn't something you'd want to have run automatically, unless you like
having an extra window thrown in your face. I've only discovered this
recently, so I'm not sure what sort of bad things this may do to your
Emacs session. Not every function was written with compilation in
mind. There are interactions with macros to consider.
I doubt there will be a noticeable performance difference. Like I said
before, most everything is already compiled, and those are the
functions that get used the most. There's just something nice about
knowing all your functions are compiled and optimized.
ParEdit is a
powerful extension to Emacs that I've just begun using recently. It's
a minor mode that forces all parenthesis, square brackets, and quotes
to be balanced at all times. While it's useful for any programming
language it's especially suited for Lisps, because it's designed for
manipulating nested parenthesis -- i.e. s-expressions. It's not
currently part of Emacs so you have to drop the script in
your load-path somewhere.
I've frequently thought that a Lisp-based shell would be an
interesting and powerful tool, much like a normal Lisp REPL. Programs
would be treated like Lisp functions. For example,
But typing all those parenthesis all the time would be quite the
nuisance. I know this from experience typing at Lisp REPLs. I imagined
something that works exactly like ParEdit would be needed to make all
that work go away. To save even more time each prompt would begin with
a nested pair, with the cursor placed between them. Then typing a
quick command is no different than a normal shell.
wellons@luna:~$ ()
Well, in Emacs we have both ParEdit and REPLs, so we can compose these
features together with just a little advice. Here's how to do it with
the Interactive Emacs-Lisp Mode (IELM) REPL. First tell IELM to use
ParEdit,
The function in IELM that spits out the next prompt
is ielm-eval-input, so we give it the advice to call the
ParEdit function afterwards to insert a parenthesis pair.
(defadviceielm-eval-input (after ielm-paredit activate)
"Begin each IELM prompt with a ParEdit parenthesis pair."
(paredit-open-round))
And that's it! Note that the first IELM prompt is not placed by this
function so it won't appear until the second prompt.
*** Welcome to IELM *** Type (describe-mode) for help.ELISP> ELISP> ()
If you want to enter a single atom and don't need parenthesis, just
hit backspace once. This is much less common so it gets the extra
keystroke.
This can be done for inferior-lisp
and SLIME to
enhance those REPLs as well. You just have to figure out which defun
to advise.
With this hash tables can be printed and read as part of normal
s-expressions with the standard lisp reader and printer functions. It
seems heavy, having to write out "hash-table" in there,
but I think it's because the #s notation will be used to
create printed forms of other lisp objects that currently do not have
one.
At work I spend about a third of my time
doing data
reduction, and it's one of my favorite things to do at the
laboratory (I've done
it on my own too). Data come in from various organizations and
sponsors in all sorts of strange formats. We have a bunch of fancy
analysis tools to work on the data, but they aren't any good if they
can't read the format. So I'm tasked with writing tools to convert
incoming data into a more useful format.
If the source file is a text-based file it's usually just a matter of
writing a parser -- possibly including a grammar -- after carefully
studying the textual structure. Binary files are
trickier. Fortunately, there are a few tools that come in handy for
identifying the format of a strange binary file.
The first is the standard utility found on any unix-like
system: file. I have no idea if it has an official website because it's a term
that's impossible to search for. It tries to identify a file based on
the magic numbers and other tests, none based on the actual
file name. I've never been to lucky to have file recognize
a strange format at work. But silence speaks volumes: it means the
data are not packed into something common, like a simple zip archive.
Next, I take a look at the file
with ent, a pseudo-random
number sequence test program. This will reveal how compressed (or even
encrypted) data are. If ent says the data are very dense, say 7 bits
per byte or more, the format is employing a good compression
algorithm. The next step would be tackling that so I can start over on
the uncompressed contents. If it's something like 4 bits per byte
there's no compression. If it's in between then it might be employing
a weak, custom compression algorithm. I've always seen the latter two.
Next I dive in with a hex editor. I use a combination of
Emacs' hexl-mode and the standard BSD
tool hexdump (for
something more static). One of the first things I like to identify is
byte order, and in a hex dump it's often obvious.
In general, better designed formats use big endian, also known as
network order. That's the standard ordering used in communication,
regardless of the native byte ordering of the network clients. The
amateur, home-brew formats are generally less thoughtful and dump out
whatever the native format is, usually little endian because that's
what x86 is. Worse, they'll also generate data on architectures that
are big endian, so you can get it both ways without any warning. In
that case your conversion tool has to be sensitive to byte order and
find some way to identify which ordering a file is using. A time-stamp
field is very useful here, because a 64-bit time-stamp read with the
wrong byte order will give a very unreasonable date.
For example, here's something I see often.
eb 03 00 00 35 00 00 00 66 1e 00 00
That's most likely 3 4-byte values, in little endian byte order. The
zeros make the integers stand out.
eb 03 00 00 35 00 00 00 66 1e 00 00
We can tell it's little endian because the non-zero digits are on the
left. This information will be useful in identifying more bytes in the
file.
Next I'd look for headers, common strings of bytes, so that I can
identify larger structures in the data. I've never had to reverse
engineer a format ... yet. I'm not sure if I could. Once I got this
far I've always been able to research the format further and find
either source code or documentation, revealing everything to me.
If the file contains strings I'll dump them out
with strings. I haven't found this too useful at work, but
it's been useful at
home.
And there's something still useful beyond these. Something I made
myself at home for a completely different purpose, but I've exploited
its side effects:
my PNG
Archiver. The original purpose of the tool is to store a file in
an image, as images are easier to share with others. The side effect
is that by viewing the image I get to see the structure of the
file. For example, here's my laptop's /bin/ls, very
roughly labeled.
It's easy to spot the different segments of the ELF format. Higher
entropy sections are more brightly colored. Strings, being composed of
ASCII-like text, have their MSB's unset, which is why they're
darker. Any non-compressed format will have an interesting profile
like this. Here's a Word doc, an infamously horrible format,
And here's some Emacs bytecode. You can tell the code vectors apart
from the constants section below it.
If you find yourself having to inspect strange files, keep these tools
around to make the job easier.
Joe Wetzel over at Inkwell Ideas put together a
neat
Random Inn/Tavern Generator for use in tabletop RPGs. He's still
working on it so it's gradually building up more features. It's just
JavaScript and HTML, so no special plugins are needed for it to
work. Right now it appears to be the only generator that makes
floorplans.
It creates a random floorplan using image segments, so you'll have to
screenshot the website if you plan on saving a particular layout as a
single image. It also generates a menu, including prices, which I
think adds a lot of personality to the otherwise bare floorplan.
There are controls to fix certain properties of the generator in
place, making it less random, so you can control things such as secret
doors, inn size, and second floors. In the future there are plans for
adding generated NPC staff and patrons, and even plot hooks.
Ameron over at Dungeon's Master just wrote a
post
Tavern Trappings as a guide for how to fill a tavern with
everything else. It compliments the random inn generator well.
Yesterday
Luke pointed out an interesting idea by Jeff
Atwood:
mark code snippets you post online with a UUID. That way it's easy
to find other places where code snippets have been used, and it's
easier to find updates. It's a neat idea that I might start using.
So, the next problem is where to get a UUID. One place to get one
is createguid.com, which will
serve you up a
fresh
version 4 UUID each visit. But I don't like relying on the
Internet for something when I don't need to. This is why I sometimes
make wget backups of my favorite sites, in case they ever
disappear. By design it's easy to generate your own UUIDs, and you
have 5
algorithms to choose from, so I'll make my own.
Since I'll always be in Emacs when I need one, I'll just have Emacs do
it. However, it's a bit tricky to do in Emacs. The pseudo-random
number generator isn't very good. It's always initialized to the same
seed when it starts, so it's up to the user to reseed
somewhere. There's no direct way to read
from /dev/urandom, instead relying on an external process
to feed it in. Relying on /dev/urandom is also not
portable. There's also no simple, portable way to find out hardware
information about the machine, such as MAC address, required by some
versions of the UUID algorithms. Elisp integers are a few bits smaller
than native integers (29-bits on 32-bit systems), so the UUID has to
be handled in very small pieces.
I was beginning to work on my own when I found
this
great solution by Martin Blais
(mirror). Specifically
the uuid-simple function, which takes an MD5 hash of
unique data in Emacs and uses it for the UUID. I believe this may
count as a version 3 UUID (or at least close enough that I don't
care). I modified it slightly, adding more unique data and a version
number.
;; 90aebf38-b33a-314b-1198-c9bffea2f2a2
(defunuuid-create ()
"Return a newly generated UUID. This uses a simple hashing of variable data."
(let ((s (md5 (format "%s%s%s%s%s%s%s%s%s%s"
(user-uid)
(emacs-pid)
(system-name)
(user-full-name)
user-mail-address
(current-time)
(emacs-uptime)
(garbage-collect)
(random)
(recent-keys)))))
(format "%s-%s-3%s-%s-%s"
(substring s 0 8)
(substring s 8 12)
(substring s 13 16)
(substring s 16 20)
(substring s 20 32))))
This returns a string containing the UUID. Next, make an interactive
function to wrap it,
(defunuuid-insert ()
"Inserts a new UUID at the point."
(interactive)
(insert (uuid-create)))
I bound this to "C-x !", which wasn't yet bound to
anything.
While regular expressions
have
limited usefulness, especially in larger programs, they're still
very handy to have from time to time. It's usually difficult to write
a lexer or tokenizer without one. Because of this several languages
build them right into the language itself, rather than tacked on as a
library. It allows the regular expressions to be stored literally in
the code, treated as its own type, rather than inside a string. The
problem with storing a regular expression inside a string is that it
can easily make an already complex regular expression much more
complex. This is because there are two levels of parsing going
on.
Consider this regular expression where we match an alphanumeric word
inside of quotes. I'm going to use slashes to delimit the regular
expression itself.
/"\w+"/
Notice there is no escaping going on. The backslash is there is
indicate a special sequence \w, which is equal
to [a-zA-Z0-9_]. This will
get parsed and
compiled into some form in memory before it is run by a
program. If the language doesn't directly support regular expressions
then we usually can't put it in the code as is, since the language
parser won't know how to deal with it. The solution is to store it
inside of a string.
However, our regular expression contains quotes and these will need to
be escaped when in a quote delimited string. But I no longer need
slashes to delimit my regular expression.
"\"\w+\""
Did you notice the error yet? If not, stop and think about it for a
minute. Our special sequence \w will not make it intact
to the regular expression compiler. That backslash will escape
the w during the string parsing step, leaving only
the w. The string we typed will get parsed into a series
of characters in memory, performing escapes along the way, and
then that sequence will be handed to the regular expression
compiler. So we have to fix it,
"\"\\w+\""
That's getting hard to understand, compared to the original. Now let's
throw a curve-ball into this: let's match a backslash at the beginning
of the word. The normal regular expression looks like this now,
/"\\\w+"/
We have to escape our backslash to make it a literal backslash, so it
takes two of them. Now, when we want to do this in a string-stored
regular expression we have to escape both of those backslashes
again. It looks like this,
"\"\\\\\\w+\""
Now to match a single backslash we have to insert four backslashes!
Quite
unfortunately,
Emacs Lisp doesn't directly support regular expressions even
though the language has a lot of emphasis on text parsing, so a lot of
Elisp code is riddled with this sort of thing. Elisp is especially
difficult because sometimes, such as during prompts, you can enter a
regular expression directly and can ignore the layer of string
parsing. It's a very conscious effort to remember which situation
you're in at different times.
Perl, Ruby, and JavaScript have regular expressions as part of the
language and it makes a lot of sense for these languages; they tend to
do a lot of text parsing. Python does it partially, with
its r' syntax. Any string preceded with an r
loses its escape rules, but it also means you can't match both single
or double quotes without falling back to a normal string with
escaping. Common Lisp may be able to do it with a
reader macro, but I've never seen it done.
Remember those two levels of parsing when writing string stored
regex. It helps avoid hair-pulling annoying mistakes.
Previously I had
predicted that LZMA tarballs were the future. I'd like to make a
slight correction to this: XZ
tarballs are the future! LZMA has actually already been supplanted
by the newer LZMA2 algorithm, which is most directly accessed with
the xz tool from XZ Utils. So look out
for .tar.xz in the future.
Right now I believe that XZ will get you the smallest compression of
any general purpose compressor (at a large memory cost). When combined
with tar it even beats
out 7z.
Like all productive computer activity on the Windows platform, dealing
with LZMA or XZ tarballs is needlessly difficult in Windows. Archive
software, such as 7-zip, doesn't
seem to handle them directly yet. They can read
the underlying tar
layer but the compression layer seems to be opaque. 7-zip actually
uses the LZMA and LZMA2 algorithms internally, so it's 99% of the way
there. They just never did the last step.
So here's where the above executables, provided for your convenience,
come in: xz.exe was cross-compiled in Debian GNU/Linux
using the MinGW compiler suite,
and it can perform both compression and decompression on both XZ and
LZMA. xzdec.exe can only do decompression, but it's
smaller. You can use these to pull out the tar archive so
that it can be accessed using the regular tools. Or you can use it to
make your own highly compact archives.
Live coding (or
livecoding) is software development as a performance art. A
programmer's screen is viewed by the audience and a program is written
and modified so that it produces sound and maybe even visual
effects. The audience gets to see the code and its effects live. I'm
not sure if "live" refers to the audience, the editing of live code,
or maybe both. There are videos all over the web of this in action so
if you haven't seen it yet do a quick search and watch one.
It's fairly easy to obtain livecoding software. For example,
there's Fluxus, which
extends PLT Scheme to support livecoding.
I've never done livecoding myself, but something I've noticed is that
Scheme seems to be a popular choice of language for livecoding. I
think I know how and why this is. Scheme, being a Lisp dialect, is
naturally
a
living system: it can be modified and extended while it's actively
running. Scheme in particular is very well suited for the task thanks
to its simplicity and optimized tail recursion.
I'll do a little text-based livecoding example in PLT Scheme to show
how it works. This will be easier to do yourself if your text editor
can interact directly with a REPL (like Emacs or DrScheme).
Let's define a function that prints a line of text to the screen and
recurses, so that it continues printing forever. The recursion is
important here and I'll get back to it. To keep things manageable I'm
also going to insert a 1 second pause.
If we call this function with (print-str) it will sit
there printing "Hello!" over and over. It will also lock up our REPL
preventing us from doing anything else. Not very useful. So let's put
it in a thread instead!
(thread print-str)
Now our program is running and we get to keep our REPL. Why do we need
to keep our REPL alive? Well, so we can
redefine print-str on the fly! In my buffer I'll go back
and change "Hello!" to "Goodbye!". While I'm doing this the function
is still spitting out "Hello!".
As soon as I tell my editor to pass this to the REPL
the print-str function gets redefined and starts printing
"Goodbye!" instead. Why did the running function change? Because of
recursion. When it called itself, it actually called the new
definition.
Since I didn't keep a handle on the thread the easiest way to
stop print-str from running is to redefine it without
recursion.
(define (print-str)
(display "Done.\n"))
And it's done. If I was really fast about this my output looks
something like this.
Hello!
Hello!
Hello!
Goodbye!
Goodbye!
Done.
That's the fundamental workings of livecoding in Scheme: I changed a
program while it was running. To turn the above into the more
interesting livecoding you see in the videos all we need are some
audio and visual bindings (which is the hard part of it all).
Two months ago I wanted to experiment with artificial neural networks
in Scheme. I've
written one in C++ before, but I wanted a pure Scheme
implementation. So I wrote one. Grab it here,
It fakes OOP with
closures and a dumb message passing scheme so that I could treat
individual neurons like objects. The neurons are wired up to push
orders along backwards, so most of the work from the implementation
point of view is on the output neurons.
If I found an application for it I thought it would be neat to save
off the network weights, thus storing the "brain", as an s-expression
that could be dropped directly in the source code (something Lisp does
very nicely). Even better, if it was a programming competition I could
obfuscate the neural network implementation a bit -- so that no one
knew it was a neural network -- and have a mysterious, opaque lump of
mixed code and data.
Using it is really simple: once the neural network functions are
loaded creating a new arbitrary network is as easy as the
following. This creates a network with 4 input neurons, a hidden layer
of 5 neurons, a hidden layer with 6 neurons, and an output layer with
3 neurons.
Then the functions train-ann and run-ann are
used, with a list of bits as arguments, to train and run the neural
network. Included in the repository are two examples showing this in
action, using the provided helper functions int->blist
(integer to bit list) and blist->int.
Like most of my neural network experiments this is really
disappointing. It doesn't seem to work well outside some trivial or
contrived situations, and it quickly slows down after adding only a
couple more layers. It might just have to do with the type of network
I'm using, and perhaps I should look into more advanced and complex
neurons.
I was inspired by an item
in
Luke's
Tumblr blog last night. It was a screenshot of a program
called PawSense, which
monitors a computer's keyboard for cat activity. (I don't know if it's
any good, but it's funny.) As anyone with cats knows, it's not unusual
to leave a computer only to come back later to see garbage typed in by
a wandering cat. I wrote a version for Emacs today.
Put it (cat-safe.el) somewhere in
your load-path (like ~/.emacs.d/) and put
this line in your .emacs file,
(require 'cat-safe)
This only monitors Emacs itself; it should help protect your buffers
but not your web browser. When cat interference is detected Emacs
switches focus to a junk buffer and lets the cat make a mess there
instead. In case your
cat happens
to type out some Shakespeare you will be able to read it in the
junk buffer. Just kill the junk buffer to return to work.
It could still use some improvement. Right now it looks for a single
key being help down, excepting keys humans tend to hold down like
backspace, delete, and space. If you play around with it you'll notice
if you press several keys at once Emacs will sometimes create a
pattern with them. I need to figure out a good way to detect this.
I'm going to run it at home for awhile to make sure it remains
transparent, but still does its job. It will probably incur a
performance penalty on frequently repeated keyboard macros.
Fractran is a
Turing-complete esoteric programming language. A Fractran program is
just an ordered list of positive, irreducible fractions. The program's
output for an input n is the output of the program run
on n multiplied by the first fraction in the list that results
in an integer. If no such multiplication results in an integer, the
output is the input n. Variables are encoded in the exponents
of the prime factorization of the input and output.
Some time ago I thought up an idea for a short story involving
Fractran. A mathematician accidentally creates a Fractran program that
can trivially factor large composites. Think something like
O(log n). It's just the right magical string of, say, 31
fractions.
The story would be a first-person narrative of the mathematician's
thoughts during a short time after the discovery, considering many of
the consequences of the program. For example, it would render much of
cryptography, which plays an essential role in the modern world,
useless. He would also wonder if mankind should deserve such a
discovery, considering how accidental it was.
This whole idea vanished once I realized that this Fractran program is
actually completely trivial. It even runs in O(1) time. It's so
trivial as to be worthless. Remember that Fractran stores its data in
the number's prime factorization? The Fractran program that can factor
any number in constant time is the identity function. To decode the
output, which matches the input, all you need to do is factor it!
Interestingly, it doesn't seem to actually be possible to implement
the identity function in Fractran (But somehow it's
Turing-complete? Hmmm... more investigation needed.), unless you
can define your program in terms of its input. For example, the
program 1/(n+1) is the identity function for
input n.
Here's something I only learned recently, since it came up when
working on Wisp. In C function
pointers are incompatible with normal pointers. For example, this is
unportable C,
int func (int x);
int main ()
{
void *p = (void *) &func;
}
This is because function pointers are a different size than other
pointers on some architectures, or even within the same architecture
with different models (x86's compact and medium models). If the
compiler in such a scenario allowed this, the pointer may be truncated
and would likely point to the wrong place. It wasn't until I added
the -pedantic flag to gcc that it started warning me
about situations like the above. The -W -Wall flags are
silent here.
The relevant part of
the ANSI C
standard lists the following as a common, but unportable,
extension to the language,
A pointer to an object or to void may be cast to a pointer to a
function, allowing data to be invoked as a function. A pointer to
a function may be cast to a pointer to an object or to void,
allowing a function to be inspected or modified (for example, by a
debugger).
I bet this issue only comes up very rarely. How often do you have to
store a function pointer in a void pointer? It subverts the type
system and is generally a bad idea. I had to do it in Wisp as part of
its value polymorphism, which is why it bit me. This is probably why
gcc doesn't get very picky over it.
This also means function pointers have less support than normal
pointers. For example, printing pointers
with printf()'s %p won't work, since it
expects a void pointer, so there's no printing them. You
can't sort them with qsort(). You can even treat the
function pointer as a blob of data to manipulate manually since
there's no safe way to make a regular pointer to it. Really, almost
any C library function that accepts pointers won't work with function
pointers.
So if you want a tricky, unfair, interview question this could be
one!
I found this Common Lisp Quick
Reference the other day
from r/lisp, and I think
it's fantastic. It's a comprehensive, libre booklet of the
symbols defined by the Common Lisp ANSI standard. Very slick!
The main version is meant to be printed out and nested with a vertical
fold, and it works quite well. If I ever get a chance to use Common
Lisp at work (a man can dream), probably at a location without
Internet access, this could come in handy. So I printed out one for
myself,
In case the website ever disappears in the future, here's all the PDFs
and LaTeX source for it. These could be out of date, so you should
check the main website first.
I've been chugging away on
Wisp, announced in my last post,
every day since I started it a few weeks ago, and it's becoming a
pretty solid system. There's now an exception system, reference
counting for dealing with garbage, and a reentrant parser. It's no
replacement for any other lisps, but I've found it to be very fun to
work on. I've put what exists so far of
the Wisp User
Guide up for viewing. The AsciiDoc source is in the Git
repository, which is here,
I wanted to show off some of the new features of Wisp, and since I was
inspired by Full
Disclojure, since it's so damn slick, I decided to make some
screencasts of Wisp in action. All of the screencast software for
GNU/Linux is pretty poor, but after a few hours of head-banging I
managed to hobble something together for you. Enjoy!
That video demonstrated the memoization function. It can be pulled in
from the memoize library. You give it a symbol, which
should have a function definition stored in it, and it will installed a
wrapper around it. In the video I used the Fibonacci function from
the examples library.
This demonstrated the "detachment" feature of Wisp, which is similar
to "futures" in Clojure. It forks off a new process, which executes
the given function. The send function can be used in the
detached process to send any lisp objects back to the parents, which
can receive them with the receive
function. The send function can be called any number of
times to continually send data back. The receive function
will block if there is no lisp object to receive yet.
(require 'examples)
(setq d (detach (lambda () (send (fib)))))
(receive d) ; Gets value from child process
This video shows off the point-free functions that have been defined:
function composition and partial application (I accidentally say
"partial evaluation" in the video). These are actually just simple
macros that any lisp could do.
Update 2010-2-04: A lot of the information below is out of date. There
is an update here: Wisp Screencasts.
This is a project I've been wanting to do for some time, and I finally
got around to doing it. I spent the last few days implementing my own
lisp interpreter in C. Today, after sinking in about 48 hours of work,
I believe I completed enough of it to consider it in a working state,
with a code base stable enough that other interested people could
contribute. It was really exciting to see everything come together
today.
You can make a clone of the Wisp repository with Git. Go ahead; don't
be shy,
To build it, all you need is a C99 compiler, make, yacc
(i.e. Bison), and
lex (i.e. Flex).
It doesn't use the readline library or one of it's clones to make a
nice interaction command line, so it's a good idea to run it
with rlwrap. That's
what I've been doing. If you plan on writing Wisp code in Emacs,
putting this in your .emacs will give you all the syntax
niceties,
You should also be able to run it as an inferior lisp and send code to
it like a normal lisp. I haven't done this yet myself.
I think the name is apt, because it really is a wisp of a lisp. As of
this writing, it weighs in at 1500 lines of code and is still very
feature-light. I haven't actually read any material about writing lisp
interpreters, so I've been winging it based on my experience with
it. It's already taught me a lot of subtle things about lisp that I
hadn't been aware of before.
Right know it's very simple. It doesn't yet support any syntax beyond
parenthesis (no ' quoting). No closures. No garbage
collection, as I'm still working out how I'm going to do
that. Dynamically scoped, since that's a lot easier to do. And, like
Scheme, it's a lisp-1, meaning functions and variables share a common
namespace. I hope to expand it to include some features from other
lisps like Arc (particularly the anonymous function syntactical
sugar), Common Lisp, and Scheme.
However, it does have already anonymous functions, which many
popular languages still don't have. :-) It's far enough along
to let you define crazy stuff like this,
(defunexample (n)
(if (<= n 0)
(lambda (x) (* x 10))
(lambda (x) (* x 2.0))))
Which you can call, interactively in this case, like so,
Because there's no garbage collection yet, it leaks memory like a
sieve. For garbage collection, I think I'll do a mark-and-sweep,
marking objects based on their reachability from the symbol
table. That still leaves some corner cases -- such as worrying about
objects in limbo in the evaluator -- that I'm not sure about. I need
to be careful not to free objects still in use. Still working that one
out.
It has the multiplication, addition, subtraction, and division
implemented, as well as the greater-than and less-than operators. Lisp
macros are implemented. A number of special forms are defined, like
let, if, set, defun, defmacro, car, cdr, not, progn, lambda, and
while. All of the predicates are implemented. It has a C interface,
which is how all the above got defined. I already have some functions
and macros defined in terms of Wisp code, too. Most of the needed
functions at this point are trivial to add, though a bit tedious, so
I'm mostly trying to focus on the core parts of the interpreter right
now.
It's amazing how much the internal code looks like lisp written in a C
dialect. I have CAR and CDR macros defined, which get used all over
the place, and code frequently uses them to walk lists like lisp code
would.
The core struct that everything works with the the object struct. It's
defined as such,
The type indicates what type of data the void pointer points to,
making it sort-of polymorphic. Note the CONS type, the cons cell, used
to create lists,
There's the familiar car and cdr pointers. There are a bunch of helper
functions to manipulate and build these. For
example, c_cons() creates a cons cell,
Look familiar? Yup, that's the lisp cons function. Since
the nil symbol is available in C code as NIL
you can chain these together in C to make a list,
Since that's so cumbersome to write out, there's a parser that can
read nice lisp code and use all those same functions to make the
lists. Hence, the lisp reader.
If you want to help, it's pretty easy to add more CFUNCs, C functions
that are exposed to Wisp lisp code. Right now, I'd like to expose the
whole C math.h library, provide a nice I/O interface, and expose a
bunch of string functions. The TODO file in the
repository contains more things to be done.
Wisp will probably be getting it's own "project page" here at some
point in the future. When it does, I'll update this post to point to
it.
Oh, and I decided to make this available under a 2-clause BSD license,
so someone could easily plug it into another program as an extension
language (once Wisp has matured first, of course). That would be
cool.
Common Lisp is possibly the most advanced programming language. Think
of pretty much any programming language feature and Common Lisp
probably has it. Since lisp is the programmable programming language,
when someone invents a new language feature it can probably be added
to Common Lisp without even touching the language core.
However, if you're interested in digging into Common Lisp to try it
out, you may find yourself quickly running into walls just getting
started. It's a lot different than other programming environments you
may be used to. The Common Lisp tutorials generally skip this step,
assuming the user has an environment, or leaving that setup for the
"vendor" to handle. So, here's a guide to setting up a great Common
Lisp environment with
Emacs and
SLIME. It should work with any Common Lisp implementation and any
operating system that can run Emacs (i.e. most of them). Even a much
less capable one like Windows.
First, you need to pick a Common Lisp implementation and install
it. Ideally, it should end up in your PATH. Like C, the language is
defined solely by its standardized specification, rather than some
canonical implementation. Steel Bank
Common Lisp (SBCL) is currently
the
highest
performing implementation, it's Free Software, and it runs on a
wide variety of platforms, so take a look at that one if you're not
sure.
Next, install Emacs. We're using Emacs not just because it's the best
text editor ever created. :-D It's because that's what
SLIME is written for, and Emacs is a lisp-aware editor. Really, Emacs
is a lisp interpreter that happens to be geared towards
text-editing. It's accused of breaking the rules of unix by being a
single, monolithic program, but it's really a whole bunch of small
lisp programs. You can even have a lisp REPL in Emacs
(ielm), similar to what we will have once we're done
here. It's plays very well with Common Lisp.
If you're unfamiliar with Emacs, you should stop here and familiarize
yourself with it a bit. Really, you could spend a decade learning
Emacs and still have more to learn. The tutorial should be good enough
for now. Fire up Emacs and run the tutorial by pressing
control+h then t. In Emacs notation,
that's C-h t. C-h is the help/documentation
prefix, which can be used to look up variables/symbols
(v), functions (f), key bindings
(k), info manuals (i), the current mode
(m), and apropos (searching) (a). In the
info manuals, you should be able to find the full Emacs manual, Elisp
reference, and Elisp tutorial, since they are generally installed
alongside Emacs these days. Nearly anything you might need to know can
be found inside the included documentation.
Next, install SLIME. I'll be a bit more specific for this one. Make
a .emacs.d directory in your home directory (whatever
your HOME environmental variable is set to). This is a common place to
put user-installed Emacs extensions. You will be putting
your slime directory in here. There are two basic ways to
obtain SLIME, as indicated right on their main page. You can do a CVS
checkout of the SLIME repository, which allows you to follow it and
run the latest version. Or you can grab a snapshot of the repository,
which is provided, and dump it in there. Since I like you so much,
I'll give you a third option. Here's a Git repository, maintained by
someone very kind, that follows SLIME's CVS repository,
git clone git://git.boinkor.net/slime.git
Ultimately, you should have a directory ~/.emacs.d/slime/
that contains a bunch of SLIME source files directly inside.
Now, we tell Emacs where SLIME is and how to use it. Make
a .emacs file in your home directory, if you haven't
already, and put this in it,
Once it's saved, either restart Emacs, or simply evaluate those lines
by putting the cursor after each them in turn and typing C-x
C-e. If you did everything right so far, you shouldn't have any
errors. (If you did, go back up and see what you did wrong.) If your
Common Lisp installation didn't end up in your PATH as
"lisp" (not uncommon) for some reason, you may need to
tell Emacs where it is. For example, I can point directly to my SBCL
installation with this line,
(setq inferior-lisp-program "/usr/bin/sbcl")
If everything is set up right, fire up SLIME with "M-x
slime". It should compile the back-end, called swank, and run a
Common Lisp REPL as an inferior process to Emacs. You should end up
with a nice prompt like this,
CL-USER>
At this line, you can start evaluating lisp expressions as you
please. But this isn't where the true power of SLIME comes in
yet. I'll give you an example: make a new file with
a .lisp extension and open it. Throw some lisp in there,
(defunadder (x)
(lambda (y) (+ x y)))
Type C-x C-k and it will send the current buffer over to
be compiled and loaded. This code here uses a closure, so you know you
aren't accidentally using Emacs lisp, as it doesn't have closures. At
the REPL you can call it,
CL-USER> (funcall (adder 5) 6)
Which will print the return value, 11. That's all there
is to it. You write code in the buffer, then with a simple keystroke
send it to the Common Lisp system to be evaluated and loaded. Because
the SLIME key bindings eclipse the Emacs lisp key bindings, you can
type this same line in the lisp source buffer place the cursor at the
end, and type C-x C-e, which will send it out to Common Lisp to be
evaluated. Look at the mode help (C-h m) to see all the
key bindings made available.
This is a great programming environment that makes Common Lisp all the
more fun to use. You run a single, continuous instance if your program
growing it gradually. (This is exactly how I
built my Emacs web server with elisp.)
You can test your code as soon as soon as it's written.
The setup can get even more advanced. The Common Lisp REPL need not be
running on the same computer. It can be running on another computer,
as long as SLIME is able to connect to it over the network. Several
developers could even share a single Common Lisp process running on a
common machine. Lots of possibilities.
If you don't have a Common Lisp book yet,
there's Practical Common
Lisp, which you can read at no cost online
or download
for reading offline. It's based on an Emacs and SLIME setup, so you'll
be right on track.
Don't stop here! This isn't everything. Check out the archives
(on the left) for more posts. Or just have a look at
the index.