nullprogram.com/blog/2011/01/30/
I previously talked about BrianScheme
(BS) and we've had some exciting updates since then. I've since caught
up with Brian so we're committing code at about the same rate. It's
really coming together to be something bigger than we both
expected. We're adding what we feel is the best of Common List and
Scheme. The Git repository log reveals just how much time we've spent
hacking at this.
The first big milestone was full bootstrapping. BS has both an
interpreter, written in C, and a compiler written in BS. The compiler
targets a VM, which is written in C. The interpreter is no longer used
directly; it's there simply for bootstrapping the compiler. It sets up
an initial environment, then loads and runs the compiler which
immediately compiles itself, and then compiles its environment, and
then compiles the main user environment. Afterward the whole thing is
lifted up on top of the VM and the interpreter is abandonded.
Once everything was bootstrapped and running in the VM, continuations
became a practical possibility, and Brian soon added them. So now
BrianScheme now has all the major components of a Scheme.
I began ramping up my contributions by adding a really solid random
number generator,
the Mersenne
twister, and providing functions to generate numbers on all sorts
of distributions: normal, Poisson, gamma, exponential, beta, and
Chi-squared. It's pretty reasonable at seeding itself, too.
In the meantime, this bootstrap process, while incredibly useful, was
really slowing down development. It was taking BS about 10
seconds to boot itself every time it was started. That can really kill
the usefulness of the system. I started to look into ways to mitigate
this, perhaps through FASLs or some kind of image dump.
After discussing it with Brian I decided to try for a memory dump,
SBCL-style. My old memory pool
allocator, which I thought I've never use again, really came in
handy, and now has a home — modified of course — in BS. It no longer
uses malloc(), instead using mmap() to get more memory, and now has
the ability to free() memory, completely replacing malloc(),
realloc(), and free(). So with BrianScheme no longer use the libc
allocator we had complete control over the program's memory. It was
just a matter of dumping the handful of big mmap() chunks to disk, and
in another process, loading them back in to the same location, and
finally hooking the environment back up.
Just as the SBCL documentation warns about, there are complicated
issues still to be resolved (and may never be). The two big ones are
alien objects and open file handles, neither of which can make it into
the image. Aliens could be in there potentially, if the foreign
library let us select its allocator. Brian made a change that gives
the FFI a hook to rebind its symbols after load, so some of the
FFI can survive the jump. The three stdin, stdout, and stderr file
handles are reconnected on load, but the old, dead handles can
potentially linger in places, waiting to cause errors when someone
tries to read them.
It was really, really exciting to see the images come back to life for
the first time. With that success the lengthy bootstrap process was no
longer a big problem because it could be bypassed much of the
time. Saving images is really simple to do, too.
(save-image "brianscheme.img")
This will create the image "brianscheme.img", which can be loaded
again later with the -l
switch. Once loaded, it will
either execute a script if given one on the command line, or it will
provide a new REPL. Because the image is mmapp()
ed into
place, loading is practically instantaneous, even if the image is
dozens of megabytes (which can happen easily).
bsch -l brianscheme.img
The BS image in its booted state is about 15MB right now on 64-bit
systems, and 7MB on 32-bit systems. They are low entropy and compress
down to about 2MB, so it's not too bad. If you write a large
program in BrianScheme and save it as an image it may be even larger.
Over the weekend I took this even further, continuing to follow in the
footsteps of SBCL: I added a feature to wrap the image in the BS
executable, so that the image itself is a standalone executable. To
make this more useful, a toplevel function can be selected to run
after the image loads, rather than a REPL. If you wrote a game in BS
and wanted to compile to a standalone program,
(load "my-game.sch")
(save-image "my-game" 'executable #t 'toplevel play-my-game)
The user would execute the file my-game
, which would load
BS and run the function play-my-game
. Because a 15MB
binary is a little unwieldy to hand out, you could compress it
beforehand with a tool like gzexe
, which transparently
compresses an executable.
The wrapper is actually very simple. It's a very slightly modified BS
executable, padded out to the system's page size (4kB, generally),
ending with a special marker. The image is concatenated to this
binary. When run, the program scans itself looking for the marker
(0xdeadbeef
), and then mmap()s the portion behind it (you
can only mmap() page-size offsets, which is why padding was
necessary).
BrianScheme should have an interesting future ahead of it.