An Emacs Pastebin
Luke is doing an interesting
threefive-part tutorial on writing
a pastebin in PHP: PHP Like a Pro (2, 3,
4, 5). The tutorial is largely an introduction to
the set of tools a professional would use to accomplish a more
involved project, the most interesting of which, for me, is
Because I have no intention of ever using PHP, I decided to follow along in parallel with my own version. I used Emacs Lisp with my simple-httpd package for the server. I really like my servlet API so was a lot more fun than I expected it to be! Here’s the source code,
Here’s what it looked like once I was all done,
It has syntax highlighting, paste expiration, and light version control. The server side is as simple as possible, consisting of only three servlets,
/pastebin/: static files
/pastebin/get: serves (immutable) pastes in JSON
/pastebin/post: accepts new pastes in JSON, returns the ID
For you non-Emacs users, the repository has a
which can be used to launch a standalone instance of the pastebin
server, so long as you have Emacs on your computer. It will fetch any
needed dependencies automatically. See the header comment of this file
A paste ID is four or more randomly-generated numbers, letters, dashes
or underscores, with some minor restrictions (
It’s appended to the end of the servlet URL.
In the first case, the servlet entirely ignores the ID. Its job is only to serve static files. In the second case the server looks up the ID in the database and returns the paste JSON.
The client-side inspects the page’s URL to determine the ID currently
being viewed, if any. It performs an asynchronous request to
/pastebin/get/<id> to fetch the paste and insert the result, if
found, into the current page.
Form submission isn’t done the normal way. Instead, the submission is
intercepted by an event handler, which wraps the form data up in JSON
(much cleaner to parse!) and sends it asynchronously to
/pastebin/post via POST. This servlet inserts the paste in the
database and responds in
text/plain with the paste ID it
generated. The client-side then redirects the browser to the paste URL
for that paste.
As I said, the server performs no page generation, so syntax
highlighting is done in the client with
highlight.js. I could have used htmlize
and supported any language that Emacs supports. However, I wanted to
keep the server as simple as possible, and, more importantly, I
really don’t trust Emacs’ various modes to be secure in operating on
arbitrary data. That’s a huge attack surface and these modes were
written without security in mind (fairly reasonable). It’s actually a
deliberate feature for Emacs to automatically
eval Elisp in comments
under certain circumstances.
As part of my fun I made a generic database API for the servlets, then implemented three different database backends. I used eieio, Emacs Lisp’s CLOS-like object system, to implement this API. Creating a new database backend is just a matter of making a new class that implements two specific methods.
The first, and default, implementation uses an Elisp hash table for storage, which is lost when Emacs exits.
The second is a flat-file database. I estimate it should be able to support at least 16 million different pastes gracefully. The on-disk format for pastes is an s-expression. Basically, this is read by Emacs, expiration date checked, converted to JSON, then served to the client.
To my great surprise there is practically no support for programmatic access to a SQL database from GNU Emacs Lisp (other Emacsen do). The closest I found was pg.el, which is asynchronous by necessity. However, the specific target I had in mind was SQLite.
I did manage to implement a third backend that uses SQLite, but it’s
a big hack. It invokes the
sqlite3 command line program once for
every request, asking for a response in CSV — the only output format
that seems to escape unambiguously. This response then has to be
parsed, so long as it’s not too long to blow the regex stack.
Update February 2014: I have found a solution to this problem!
This has been an educational project for me. As a tutorial and for practice I’ll probably write the server again from scratch using other languages and platforms (Node.js and Hunchentoot maybe?), keeping the same front-end.