nullprogram.com/blog/2014/04/26/
For many years Richard Stallman (RMS) prohibited a foreign function
interface (FFI) in GNU Emacs. An FFI is an API for dynamically calling
native libraries at run-time, like the Java Native Interface (JNI). He
was concerned that people might use it to make proprietary extensions
to the popular editor. This was the same (paranoid) justification for
rejecting a package manager in Emacs for many years, that someone
might use it to distribute proprietary packages.
Fortunately, times have changed. RMS reevaluated his
stances on FFI and on package managers. Today Emacs comes with
a package manager (package.el), and there are multiple package
repositories with no proprietary packages in sight. Though, outside of
some unaccepted patches, no significant progress has been
made to add an FFI.
A few weeks ago I did something about that by writing a package that
adds an FFI. It requires no patches or any other changes to Emacs
itself. Instead, it drives a subprocess running libffi,
passing arguments and return values back and forth through a pipe, in
the spirit of EmacSQL. It’s not as efficient as a built-in
API, but it could potentially be distributed through an ELPA
repository.
The API is modeled loosely after Julia’s elegant FFI. A call
interface (CIF) doesn’t need to be prepared ahead of time. Provide all
the necessary information at the call site and the library takes care
of building and caching CIFs and handles for you.
API Examples
The core function for the FFI is ffi-call
. Here’s an example that
calls the system’s srand()
and then rand()
.
;; seed with 0
(ffi-call nil "srand" [:void :uint32] 0)
;; => :void
(ffi-call nil "rand" [:sint32])
;; => 1102520059
The first two arguments are similar to the first two arguments of
dlsym()
. For ffi-call
, the first argument is the library shared
object name. The back-end automatically takes care of obtaining a
handle on the library with dlopen()
. In this case we’re accessing a
function that’s already in the main program, so we pass nil. This is
identical to passing NULL to dlsym()
. In this FFI, nil always
corresponds to NULL.
The second argument is the function name, just like dlsym()
’s second
argument.
The third argument is the function signature. It’s a vector of
keywords declaring the return value type followed by the types of each
argument. In this example, srand()
returns nothing (void) and
accepts a single 32-bit unsigned argument, so the signature is
[:void :uint32]
.
The remaining arguments are the native function arguments. I can keep
making the second FFI call (“rand”) to retrieve different numbers,
using the first FFI call (“srand”) to reset the sequence.
Using a Library
Here’s another example, loading libm
and calling cos
.
;; cos(1.2)
(ffi-call "libm.so" "cos" [:double :double] 1.2)
;; => 0.362357754476674
The first time a library is used, the back-end creates a handle for it
with dlopen()
. Further calls will reuse the handle, trying to be as
efficient as possible. Handles are never closed.
Pointers
Here are a couple of examples that use pointers. As stated before, nil
is used to pass a NULL pointer. Like the underlying libffi, the FFI
doesn’t care what kind of pointer you’re passing, just that it’s a
pointer, so it’s declared with :pointer
.
;; time(NULL);
(ffi-call nil "time" [:uint64 :pointer] nil)
;; => 1396496875
Strings are automatically copied to the subprocess, their lifetime
tied to the lifetime of the Elisp string (note: this detail is still
unimplemented). When used as arguments, they become pointers.
;; getenv("DISPLAY")
(ffi-call nil "getenv" [:pointer :pointer] "DISPLAY")
;; => 0x7fffc13ceb29
(ffi-get-string '0x7fffc13ceb29)
;; => ":0"
Pointers can be handled as values on the Elisp side. They’re
represented as symbols whose name is an address. In the above example,
0x7fffc13ceb29
is one of these symbols. I would have preferred to
use a plain integer to represent pointers, but, because Elisp integers
are tagged, they’re guaranteed not to be wide enough for this. I
plan to add pointer operators to do pointer arithmetic on these
special pointer values.
The function ffi-get-string
is used to retrieve the null-terminated
string referenced by a pointer. If the string returned by getenv()
needed to be freed (it doesn’t and shouldn’t), the FFI caller would
need to be careful to call free()
as another FFI call.
How It Works: The Stack Machine
My goal is to keep the back-end as simple as possible. All resource
management is handled by Emacs, tied to garbage collection.
For example, the pointer returned by dlopen()
isn’t stored anywhere
in the subprocess. It’s passed to Emacs and managed there. To call a
function using the handle, the pointer is transmitted back to the
subprocess.
To keep it simple, the back-end is just a stack machine with a simple
human-readable bytecode. You can see the instruction set by looking at
the big switch statement in ffi-glue.cc
. For example, to push a
signed 2-byte integer 237 onto the stack, send a j
followed by an
ASCII representation of the number (terminated by a space if needed):
j237
.
As usual, my assumption is that the Elisp printer and reader is faster
than any possible serialization I could implement within Elisp itself.
This also nicely sidesteps the byte-order issue.
The function signature is declared by pushing zeros of the
return/argument types onto the stack, with a special void “value” used
to communicate void
. Once it’s all set up, the C
instruction is
called, collapsing the signature into a CIF handle: a pointer for the
Elisp side to manage.
Pointers to raw strings of bytes are pushed onto the stack with the
M
instruction. It pops the top integer on the stack to get the byte
count, reads that number of bytes from input into a buffer,
null-terminates the buffer in case it’s used as a string, and finally
puts a pointer to that buffer on the stack.
Calling functions is just a matter of pushing all the needed
information onto the stack, invoking libffi to magically call the
function, then popping the result off the stack. Popping a value
transmits it to Elisp.
Stack Machine Example
Here’s a concise example that calls cos(1.2)
(assuming libm.so is
already linked). The actual Elisp-generated FFI bytecode doesn’t plan
things quite this way — particularly because it needs to keep track
of the various pointers involved — but this example keeps it simple.
You can run this example manually by executing the ffi-glue
program
and pasting in that line as standard input. The result will be
printed.
d1.2
: Push a double, 1.2, onto the stack. This will be the
function argument.
d0d0
: Push a couple of zero doubles onto the stack. This is our
function signature. It takes a double and returns a double.
w1
: Push an unsigned 32-bit 1 onto the stack. Instructions that
use integers accept unsigned 32-bit integers. This 1 indicates that
our function accepts one argument.
C
: create a CIF. The integer 1 and the two 0 doubles are
consumed and a pointer to a CIF is put on the stack. Elisp would
normally pop this off and save it for future use, but we’re going
to leave it there (and ultimately leak it in the example).
p0
: Push a NULL onto the stack. p
means push a pointer and 0
is a NULL pointer. This is our library handle. We’re assuming cos
will be in the main program.
w3Mcos
: Put a pointer to the string “cos” into the stack. First
push on the number 3 (string length), then M
to read from input,
then pass three bytes: “cos”. In our example, this buffer will be
leaked because we lose the buffer pointer.
S
: Call dlsym()
on the string and handle on top of the stack.
This consumes the top two values (NULL and “cos”), and pushes a
function handle on top of the stack. At this point the stack has
three values: 1.2, the CIF, and the function handle.
c
: Call the function pointed to by the top of the stack. This
consumes the top pointer, the CIF below it, and the CIF indicates
how many more values to consume: just one in this case, since the
function takes one argument. The function’s return value is pushed
on the stack. If the function is void
, the special void “value”
is pushed on the stack.
o
: Pop the top stack value, sending it to Emacs. This is
what would be returned by ffi-call
.
Before I got the Elisp side of things going, I was testing out the
back-end by writing lots of little programs like this by hand.
A Safe FFI
While using an FFI through a pipe is slow compared to a built-in FFI,
there is a distinct advantage. The FFI can never crash Emacs! Normally,
making calls to an FFI is unsafe. It allows the programmer to
violate normal language constraints. If the programmer misuses the
FFI, the whole process may crash or become corrupt. This will lose any
state held behind foreign interface, but Emacs will be safe.
In my package, the handle for the FFI Emacs subprocess is called the
context. A context is automatically established and bound to the
ffi-context
global variable as needed. This context keeps track of
CIFs, string buffers, handles, and any other resources held by the
subprocess. If the subprocess dies, the context becomes meaningless
since the pointers it holds are dead.
Limitations
This FFI package is about 80% complete. It occasionally leaks memory
in the subprocess, it’s overly-sensitive to mis-typing, it doesn’t
manage stdin/stdout, it can’t inspect/modify structs, and it can’t set
up closures.
The last point, closures, would require some changes to the
interprocess communication. The purpose here would be to allow foreign
functions to call Elisp functions. The subprocess would need to be
able to initiate activity with Elisp.
Manipulating structs is complex, and even libffi has limited support
for working with them. It allows structs to be declared, but leaves
alignment and access up to the user to sort out. That’s where the
previously-mentioned pointer arithmetic comes into play.
Currently stdin, stdout, and stderr are problems, especially when I
was trying to write a test GTK application with Elisp. Any command
line junkie knows that GTK (and Qt) applications are ridiculously
noisy. It spews hundreds of lines of warnings and notifications as
part of its normal operation. This noise interferes with FFI
communication with Emacs. I need to figure out how to separate this
and get standard input/output/error to/from Emacs through separate
channels.
Like libffi, there are no guarantees about variadic function calls. It
should generally Just Work, but you can’t rely on it.
The whole thing will not work as well in 32-bit Emacs, where integers
are limited to a tiny 29 bits. For example, those rand()
return
values will simply not fit. In the long run, this is probably the
single largest barrier to making the FFI work smoothly. It’s too easy
to run into large integer values.
Right now I consider it a proof of concept; an FFI really can be
done this way. I don’t have any particular uses in mind, and, outside
of the “cool factor,” I can’t actually think of any useful
applications. If a solid FFI already existed, I may have tried to use
it for EmacSQL rather than use this subprocess trick. My FFI is
probably mature enough to drive SQLite, so maybe this is the future of
EmacSQL.
If you can think of a good use for an Emacs FFI, please share it. I
need good test ideas.