Blowpipe: a Blowfish-encrypted, Authenticated Pipe

Blowpipe is a toy crypto tool that creates a Blowfish-encrypted pipe. It doesn’t open any files and instead encrypts and decrypts from standard input to standard output. This pipe can encrypt individual files or even encrypt a network connection (à la netcat).

Most importantly, since Blowpipe is intended to be used as a pipe (duh), it will never output decrypted plaintext that hasn’t been authenticated. That is, it will detect tampering of the encrypted stream and truncate its output, reporting an error, without producing the manipulated data. Some very similar tools that aren’t considered toys lack this important feature, such as aespipe.

Purpose

Blowpipe came about because I wanted to study Blowfish, a 64-bit block cipher designed by Bruce Schneier in 1993. It’s played an important role in the history of cryptography and has withstood cryptanalysis for 24 years. Its major weakness is its small block size, leaving it vulnerable to birthday attacks regardless of any other property of the cipher. Even in 1993 the 64-bit block size was a bit on the small side, but Blowfish was intended as a drop-in replacement for the Data Encryption Standard (DES) and the International Data Encryption Algorithm (IDEA), other 64-bit block ciphers.

The main reason I’m calling this program a toy is that, outside of legacy interfaces, it’s simply not appropriate to deploy a 64-bit block cipher in 2017. Blowpipe shouldn’t be used to encrypt more than a few tens of GBs of data at a time. Otherwise I’m fairly confident in both my message construction and my implementation. One detail is a little uncertain, and I’ll discuss it later when describing message format.

A tool that I am confident about is Enchive, though since it’s intended for file encryption, it’s not appropriate for use as a pipe. It doesn’t authenticate until after it has produced most of its output. Enchive does try its best to delete files containing unauthenticated output when authentication fails, but this doesn’t prevent you from consuming this output before it can be deleted, particularly if you pipe the output into another program.

Usage

As you might expect, there are two modes of operation: encryption (-E) and decryption (-D). The simplest usage is encrypting and decrypting a file:

$ blowpipe -E < data.gz > data.gz.enc
$ blowpipe -D < data.gz.enc | gunzip > data.txt

In both cases you will be prompted for a passphrase which can be up to 72 bytes in length. The only verification for the key is the first Message Authentication Code (MAC) in the datastream, so Blowpipe cannot tell the difference between damaged ciphertext and an incorrect key.

In a script it would be smart to check Blowpipe’s exit code after decrypting. The output will be truncated should authentication fail somewhere in the middle. Since Blowpipe isn’t aware of files, it can’t clean up for you.

Another use case is securely transmitting files over a network with netcat. In this example I’ll use a pre-shared key file, keyfile. Rather than prompt for a key, Blowpipe will use the raw bytes of a given file. Here’s how I would create a key file:

$ head -c 32 /dev/urandom > keyfile

First the receiver listens on a socket (bind(2)):

$ nc -lp 2000 | blowpipe -D -k keyfile > data.zip

Then the sender connects (connect(2)) and pipes Blowpipe through:

$ blowpipe -E -k keyfile < data.zip | nc -N hostname 2000

If all went well, Blowpipe will exit with 0 on the receiver side.

Blowpipe doesn’t buffer its output (but see -w). It performs one read(2), encrypts whatever it got, prepends a MAC, and calls write(2) on the result. This means it can comfortably transmit live sensitive data across the network:

$ nc -lp 2000 | blowpipe -D

# dmesg -w | blowpipe -E | nc -N hostname 2000

Kernel messages will appear on the other end as they’re produced by dmesg. Though keep in mind that the size of each line will be known to eavesdroppers. Blowpipe doesn’t pad it with noise or otherwise try to disguise the length. Those lengths may leak useful information.

Blowfish

This whole project started when I wanted to play with Blowfish as a small drop-in library. I wasn’t satisfied with the selection, so I figured it would be a good exercise to write my own. Besides, the specification is both an enjoyable and easy read (and recommended). It justifies the need for a new cipher and explains the various design decisions.

I coded from the specification, including writing a script to generate the subkey initialization tables. Subkeys are initialized to the binary representation of pi (the first ~10,000 decimal digits). After a couple hours of work I hooked up the official test vectors to see how I did, and all the tests passed on the first run. This wasn’t reasonable, so I spent awhile longer figuring out how I screwed up my tests. Turns out I absolutely nailed it on my first shot. It’s a really great sign for Blowfish that it’s so easy to implement correctly.

Blowfish’s key schedule produces five subkeys requiring 4,168 bytes of storage. The key schedule is unusually complex: Subkeys are repeatedly encrypted with themselves as they are being computed. This complexity inspired the bcrypt password hashing scheme, which essentially works by iterating the key schedule many times in a loop, then encrypting a constant 24-byte string. My bcrypt implementation wasn’t nearly as successful on my first attempt, and it took hours of debugging in order to match OpenBSD’s outputs.

The encryption and decryption algorithms are nearly identical, as is typical for, and a feature of, Feistel ciphers. There are no branches (preventing some side-channel attacks), and the only operations are 32-bit XOR and 32-bit addition. This makes it ideal for implementation on 32-bit computers.

One tricky point is that encryption and decryption operate on a pair of 32-bit integers (another giveaway that it’s a Feistel cipher). To put the cipher to practical use, these integers have to be serialized into a byte stream. The specification doesn’t choose a byte order, even for mixing the key into the subkeys. The official test vectors are also 32-bit integers, not byte arrays. An implementer could choose little endian, big endian, or even something else.

However, there’s one place in which this decision is formally made: the official test vectors mix the key into the first subkey in big endian byte order. By luck I happened to choose big endian as well, which is why my tests passed on the first try. OpenBSD’s version of bcrypt also uses big endian for all integer encoding steps, further cementing big endian as the standard way to encode Blowfish integers.

Blowfish library

The Blowpipe repository contains a ready-to-use, public domain Blowfish library written in strictly conforming C99. The interface is just three functions:

void blowfish_init(struct blowfish *, const void *key, int len);
void blowfish_encrypt(struct blowfish *, uint32_t *, uint32_t *);
void blowfish_decrypt(struct blowfish *, uint32_t *, uint32_t *);

Technically the key can be up to 72 bytes long, but the last 16 bytes have an incomplete effect on the subkeys, so only the first 56 bytes should matter. Since bcrypt runs the key schedule multiple times, all 72 bytes have full effect.

The library also includes a bcrypt implementation, though it will only produce the raw password hash, not the base-64 encoded form. The main reason for including bcrypt is to support Blowpipe.

Message format

The main goal of Blowpipe was to build a robust, authenticated encryption tool using only Blowfish as a cryptographic primitive.

  1. It uses bcrypt with a moderately-high cost as a key derivation function (KDF). Not terrible, but this is not a memory hard KDF, which is important for protecting against cheap hardware brute force attacks.

  2. Encryption is Blowfish in “counter” CTR mode. A 64-bit counter is incremented and encrypted, producing a keystream. The plaintext is XORed with this keystream like a stream cipher. This allows the last block to be truncated when output and eliminates some padding issues. Since CRT mode is trivially malleable, the MAC becomes even more important. In CTR mode, blowfish_decrypt() is never called. In fact, Blowpipe never uses it.

  3. The authentication scheme is Blowfish-CBC-MAC with a unique key and encrypt-then-authenticate (something I harmlessly got wrong with Enchive). It essentially encrypts the ciphertext again with a different key, but in Cipher Block Chaining mode (CBC), but it only saves the final block. The final block is prepended to the ciphertext as the MAC. On decryption the same block is computed again to ensure that it matches. Only someone who knows the MAC key can compute it.

Of all three Blowfish uses, I’m least confident about authentication. CBC-MAC is tricky to get right, though I am following the rules: fixed length messages using a different key than encryption.

Wait a minute. Blowpipe is pipe-oriented and can output data without buffering the entire pipe. How can there be fixed-length messages?

The pipe datastream is broken into 64kB chunks. Each chunk is authenticated with its own MAC. Both the MAC and chunk length are written in the chunk header, and the length is authenticated by the MAC. Furthermore, just like the keystream, the MAC is continued from previous chunk, preventing chunks from being reordered. Blowpipe can output the content of a chunk and discard it once it’s been authenticated. If any chunk fails to authenticate, it aborts.

This also leads to another useful trick: The pipe is terminated with a zero length chunk, preventing an attacker from appending to the datastream. Everything after the zero-length chunk is discarded. Since the length is authenticated by the MAC, the attacker also cannot truncate the pipe since that would require knowledge of the MAC key.

The pipe itself has a 17 byte header: a 16 byte random bcrypt salt and 1 byte for the bcrypt cost. The salt is like an initialization vector (IV) that allows keys to be safely reused in different Blowpipe instances. The cost byte is the only distinguishing byte in the stream. Since even the chunk lengths are encrypted, everything else in the datastream should be indistinguishable from random data.

Portability

Blowpipe runs on POSIX systems and Windows (Mingw-w64 and MSVC). I initially wrote it for POSIX (on Linux) of course, but I took an unusual approach when it came time to port it to Windows. Normally I’d invent a generic OS interface that makes the appropriate host system calls. This time I kept the POSIX interface (read(2), write(2), open(2), etc.) and implemented the tiny subset of POSIX that I needed in terms of Win32. That implementation can be found under w32-compat/. I even dropped in a copy of my own getopt().

One really cool feature of this technique is that, on Windows, Blowpipe will still “open” /dev/urandom. It’s intercepted by my own open(2), which in response to that filename actually calls CryptAcquireContext() and pretends like it’s a file. It’s all hidden behind the file descriptor. That’s the unix way.

I’m considering giving Enchive the same treatment since it would simply and reduce much of the interface code. In fact, this project has taught me a number of ways that Enchive could be improved. That’s the value of writing “toys” such as Blowpipe.

Gap Buffers Are Not Optimized for Multiple Cursors

Gap buffers are a common data structure for representing a text buffer in a text editor. Emacs famously uses gap buffers — long-standing proof that gap buffers are a perfectly sufficient way to represent a text buffer.

A gap buffer is really a pair of buffers where one buffer holds all of the content before the cursor (or point for Emacs), and the other buffer holds the content after the cursor. When the cursor is moved through the buffer, characters are copied from one buffer to the other. Inserts and deletes close to the gap are very efficient.

Typically it’s implemented as a single large buffer, with the pre-cursor content at the beginning, the post-cursor content at the end, and the gap spanning the middle. Here’s an illustration:

The top of the animation is the display of the text content and cursor as the user would see it. The bottom is the gap buffer state, where each character is represented as a gray block, and a literal gap for the cursor.

Ignoring for a moment more complicated concerns such as undo and Unicode, a gap buffer could be represented by something as simple as the following:

struct gapbuf {
    char *buf;
    size_t total;  /* total size of buf */
    size_t front;  /* size of content before cursor */
    size_t gap;    /* size of the gap */
};

This is close to how Emacs represents it. In the structure above, the size of the content after the cursor isn’t tracked directly, but can be computed on the fly from the other three quantities. That is to say, this data structure is normalized.

As an optimization, the cursor could be tracked separately from the gap such that non-destructive cursor movement is essentially free. The difference between cursor and gap would only need to be reconciled for a destructive change — an insert or delete.

A gap buffer certainly isn’t the only way to do it. For example, the original vi used an array of lines, which sort of explains some of its quirky line-oriented idioms. The BSD clone of vi, nvi, uses an entire database to represent buffers. Vim uses a fairly complex rope-like data structure with page-oriented blocks, which may be stored out-of-order in its swap file.

Multiple cursors

Multiple cursors is fairly recent text editor invention that has gained a lot of popularity recent years. It seems every major editor either has the feature built in or a readily-available extension. I myself used Magnar Sveen’s well-polished package for several years. Though obviously the concept didn’t originate in Emacs or else it would have been called multiple points, which doesn’t quite roll off the tongue quite the same way.

The concept is simple: If the same operation needs to done in many different places in a buffer, you place a cursor at each position, then drive them all in parallel using the same commands. It’s super flashy and great for impressing all your friends.

However, as a result of improving my typing skills, I’ve come to the conclusion that multiple cursors is all hat and no cattle. It doesn’t compose well with other editing commands, it doesn’t scale up to large operations, and it’s got all sorts of flaky edge cases (off-screen cursors). Nearly anything you can do with multiple cursors, you can do better with old, well-established editing paradigms.

Somewhere around 99% of my multiple cursors usage was adding a common prefix to a contiguous serious of lines. As similar brute force options, Emacs already has rectangular editing, and Vim already has visual block mode.

The most sophisticated, flexible, and robust alternative is a good old macro. You can play it back anywhere it’s needed. You can zip it across a huge buffer. The only downside is that it’s less flashy and so you’ll get invited to a slightly smaller number of parties.

But if you don’t buy my arguments about multiple cursors being tasteless, there’s still a good technical argument: Gap buffers are not designed to work well in the face of multiple cursors!

For example, suppose we have a series of function calls and we’d like to add the same set of arguments to each. It’s a classic situation for a macro or for multiple cursors. Here’s the original code:

foo();
bar();
baz();

The example is tiny so that it will fit in the animations to come. Here’s the desired code:

foo(x, y);
bar(x, y);
baz(x, y);

With multiple cursors you would place a cursor inside each set of parenthesis, then type x, y. Visually it looks something like this:

Text is magically inserted in parallel in multiple places at a time. However, if this is a text editor that uses a gap buffer, the situation underneath isn’t quite so magical. The entire edit doesn’t happen at once. First the x is inserted in each location, then the comma, and so on. The edits are not clustered so nicely.

From the gap buffer’s point of view, here’s what it looks like:

For every individual character insertion the buffer has to visit each cursor in turn, performing lots of copying back and forth. The more cursors there are, the worse it gets. For an edit of length n with m cursors, that’s O(n * m) calls to memmove(3). Multiple cursors scales badly.

Compare that to the old school hacker who can’t be bothered with something as tacky and modern (eww!) as multiple cursors, instead choosing to record a macro, then play it back:

The entire edit is done locally before moving on to the next location. It’s perfectly in tune with the gap buffer’s expectations, only needing O(m) calls to memmove(3). Most of the work flows neatly into the gap.

So, don’t waste your time with multiple cursors, especially if you’re using a gap buffer text editor. Instead get more comfortable with your editor’s macro feature. If your editor doesn’t have a good macro feature, get a new editor.

If you want to make your own gap buffer animations, here’s the source code. It includes a tiny gap buffer implementation:

Ten Years of Blogging

As of today, I’ve been blogging for 10 years. In this time I’ve written 302,000 words across 343 articles — a rate of one article every week and a half. These articles form a record of my professional progress, touching on both “hard” technical skills and “soft” communication skills. My older articles are a personal reminder of how far I’ve come. They are proof that I’m not as stagnant as I sometimes feel, and it helps me to sympathize with others who are currently in those earlier stages of their own career.

That index where you can find these 343 articles is sorted newest-first, because it correlates with best-first. It’s a trend I hope to continue.

History

Before blogging, I had a simple Penn State student page showcasing a few small — and, in retrospect, trivial — side projects (1, 2, 3), none of which went anywhere. Around the beginning of my final semester of college, I was inspired by Mark Dominus’ The Universe of Discourse and Andy Owen’s Friendly Robot Overlord (gone since 2011) to start my own blosxom blog. It would be an outlet to actively discuss my projects. Some time later GitHub was founded, and I switched to a static blog hosted by GitHub Pages, which is where it lives to this day.

It’s been more challenging to manage all this content than I ever anticipated. It’s like maintaining a large piece of software, except it’s naturally more fragile. Any time I make a non-trivial change to the CSS, I have to inspect the archives to check if I broke older articles. If I did, sometimes it’s a matter of further adjusting the CSS. Other times I’ll mass edit a couple hundred articles in order to normalize some particular aspect, such as heading consistency or image usage. (Running a macro over an Emacs’ Dired buffer is great for this.)

I decided in those early days to Capitalize Every Word of the Title, and I’ve stuck with this convention purely out of consistency even though it’s looked weird to me for years. I don’t want to edit the old titles, and any hard changeover date would be even weirder (in the index listing).

With more foresight and experience, I could have laid down better conventions for myself from the beginning. Besides the universal impossibility of already having experience before it’s needed, there’s also the issue that the internet’s conventions have changed, too. This blog is currently built with HTML5, but this wasn’t always the case — especially considering that predates HTML5. When I switched to HTML5, I also adjusted some of my own conventions to match, since, at the time, I was still writing articles in raw HTML.

The mobile revolution also arrived since starting this blog. Today, about one third of visitors read the blog from a mobile device. I’ve also adjusted the CSS to work well on these devices. To that third of you: I hope you’re enjoying the experience!

Just in case you haven’t tried it, the blog also works really well with terminal-based browsers, such as Lynx and ELinks. Go ahead and give it a shot. The header that normally appears at the top of the page is actually at the bottom of the HTML document structure. It’s out of the way for browsers that ignore CSS.

If that’s not enough, last year I also spent effort making the printed style of my articles look nice. Take a look at the printed version of this article (i.e. print preview, print to PDF), and make sure to turn off the little headers added by the browser. A media selector provides a separate print stylesheet. Chrome / Chromium has consistently had the best results, followed by Firefox. Someday I’d like for browsers to be useful as typesetting engines for static documents — as an alternative to LaTeX and Groff — but they’ve still got a ways to go with paged media. (Trust me, I’ve tried.)

With more foresight, I could have done something better with my permanent URLs. Notice how they’re just dates and don’t include the title. URLs work better when they include human-meaningful context. Ideally I should be able to look at any one of my URLs and know what it’s about. Again, this decision goes all the way back to those early days when I first configured blosxom, not knowing any better.

URLs are forever, and I don’t want to break all my old links. Consistency is better than any sort of correction. I’m also practically limited to one article per day, though this has never been a real problem.

Motivation

For me, an important motivation for writing is to say something unique about a topic. For example, I’m not interested in writing a tutorial unless either no such tutorial already exists, or there’s some vital aspect all the existing tutorials miss. Each article should add new information to the internet, either raw information or through assembling existing information in a new way or with a unique perspective.

I also almost constantly feel like I’m behind the curve, like I don’t know enough or I don’t have enough skill. As many of you know, the internet is really effective at causing these feelings. Every day lots of talented people are sharing interesting and complicated things across all sorts of topics. For topics that overlap my own desired expertise, those projects have me thinking, “Man, I have no idea how to even start doing that. There’s so much I don’t know, and so much more I need to learn.” Writing articles as I learn is a great way to keep on top of new subjects.

This is tied to another problem: I have a tendency to assume the things I’ve known for awhile are common knowledge. This shows up in a few ways. First, if everyone knows what I know, plus they know a bunch of things that I don’t know, then I’ve got a lot of catching up to do. That’s another source of feeling behind the curve.

Second, when writing an article on a topic where I’ve got years of experience, I leave out way too many important details, assuming the reader already knows them. When an article I regard as valuable gets a weak response, it’s probably related to this issue.

Third, after three years of teaching, it seems like it’s becoming more difficult to put myself in the student’s shoes. I’m increasingly further away from my own days learning those early topics, and it’s harder to remember the limitations of my knowledge at that time. Having this blog really helps, and I’ve re-read some of my older articles to recall my mindset at the time.

Another way the blog helps is that it’s like having my own textbook. When teaching a topic to someone — and not necessarily a formal mentee — or even when just having a discussion, I will reference my articles when appropriate. Since they’re designed to say something unique, my article may be the only place to find certain information in a conveniently packaged form.

Finally, the last important motivating factor is that I want to spread my preferred memes. Obviously the way I do things is the Right Way, and the people who do things differently (the Wrong Way) are stupid and wrong. By writing about the sorts of topics and technologies I enjoy — C, low-level design, Emacs, Vim, programming on unix-like systems, my personal programming style — I’m encouraging others to follow my lead. Surely I’m responsible for at least one Emacs convert out there!

“Formal” professional writing

Despite having three or four novels worth of (mostly) technical writing here, my formal workplace writing leaves much to be desired. I’m no standout in this area. In the same period of time I’ve written just a handful of formal memos, each about the same length as a long blog post.

Why so few? These memos are painful to write. In order to be officially recognized, the formal memo process must be imposed upon me. What this means is that, compared to a blog post, these memos take at least an order of magnitude longer to write.

The process involves more people, dragged out over a long period of time. The writing is a lot less personal, which, in my opinion, makes it drier and less enjoyable. After the initial draft is complete, I have to switch to vastly inferior tools: emailing the same Microsoft Office documents back and forth between various people, without any proper source control. The official, mandated memo template was created by someone who didn’t know how to effectively operate a word processor, who had a poor sense of taste (ALL CAPS HEADINGS), and who obviously had no training in typesetting or style.

At the end of this long process, it’s filed into a system with no practical search capability, and where it will be quietly purged after five years, never to be seen again. Outside of the reviewers who were directly involved in the memo process, somewhere between zero and two people will have actually read it. Literally.

Arguably the memo might more polished than a blog post. I’m skeptical of this, but suppose that’s true. I’d still much rather have written ten less-polished blog posts than one more-polished memo. That’s also ten shots to produce, by chance, a more valuable article than the single, precious memo.

Let’s broaden the scope to academic papers. Thanks to some great co-workers — all three of whom are smarter and handsomer than me — a year ago I finally got a published academic paper under my belt (and more to come): ROP Gadget Prevalence and Survival under Compiler-based Binary Diversification Schemes (and I said memos were dry!). A ton of work went into this paper, and it’s far more substantial than any memo or single blog post. The process was a lot more pleasant (LaTeX instead of Word), and the results are definitely much more polished than a typical blog post. It reads well and has interesting information to present.

This all sounds great until you consider the impact. According to ACM’s statistics, the paper has been accessed 130 times as of this writing. (Yes, providing an unofficial link to the paper like I just did above doesn’t help, but I ran out of those crappy “free” links. Sue me.) Sure, the PDF might have been passed around in untrackable ways, but I also bet a lot of those accesses were just downloads that were never read. So let’s split the difference and estimate around 130 people read it.

What kind of impact does a blog post have? Talking about these numbers feels a bit taboo, like discussing salaries, but it’s important for the point I’m about to make.

Note that all but the last has been published for less than time than our paper. The average time on these pages is between 5 and 6 minutes, so these are actual readers, not just visitors that take one glance and leave. Thanks to the information age, a technical blog article on established blog can reach an audience 100 times larger than a journal for a fraction of the effort and cost. There are other benefits, too:

  1. I get immediate feedback in the form of comments and email (open peer review).

  2. The content is available for free (open access). It’s trivial to link and share blog articles.

  3. Even more, this entire blog is in the public domain. If you don’t believe me, check out the public domain dedication in the footer of this page. It’s been there for years, and you can verify that yourself. Every single change to this blog in the past 6 years has been publicly documented (transparency).

  4. When I write about a topic, I make it a goal to provide the code and data to try it for yourself (open data and materials). This code and data is also either in the public domain, or as close to it as possible.

  5. Link aggregators and social media are great at finding the best stuff and making it more visible (censorship resistance). When I have a big hit, it’s often Reddit or Hacker News driving people to my article. Sometimes it’s other blogs.

In 2017, a blog is by far the most effective way to publish and share written information, and have more scientific quality than journals. More people should be publishing through blog articles than traditional forms of publications, which are far less effective.

Since this has proven to be such an effective medium, I’m going to make a promise right here, right now. And as I explained above with transparency, there are no take-backs on this one. If you’re reading this, it’s on the public record forever. I promise to deliver another 10 years of free, interesting, useful content. This stuff is going to be even better, too. On this blog’s 20th anniversary, September 1st, 2027, we’ll see how I did.

Vim vs. Emacs: The Working Directory

Vim and Emacs have different internals models for the current working directory, and these models influence the overall workflow for each editor. They decide how files are opened, how shell commands are executed, and how the build system is operated. These effects even reach outside the editor to influence the overall structure of the project being edited.

In the traditional unix model, which was eventually adopted everywhere else, each process has a particular working directory tracked by the operating system. When a process makes a request to the operating system using a relative path — a path that doesn’t begin with a slash — the operating system uses the process’ working directory to convert the path into an absolute path. When a process forks, its child starts in the same directory. A process can change its working directory at any time using chdir(2), though most programs never need to do it. The most obvious way this system call is exposed to regular users is through the shell’s built-in cd command.

Vim’s spiritual heritage is obviously rooted in vi, one of the classic unix text editors, and the most elaborate text editor standardized by POSIX. Like vi, Vim closely follows the unix model for working directories. At any given time Vim has exactly one working directory. Shell commands that are run within Vim will start in Vim’s working directory. Like a shell, the cd ex command changes and queries Vim’s working directory.

Emacs eschews this model and instead each buffer has its own working directory tracked using a buffer-local variable, default-directory. Emacs internally simulates working directories for its buffers like an operating system, resolving absolute paths itself, giving credence to the idea that Emacs is an operating system (“lacking only a decent editor”). Perhaps this model comes from ye olde lisp machines?

In contrast, Emacs’ M-x cd command manipulates the local variable and has no effect on the Emacs process’ working directory. In fact, Emacs completely hides its operating system working directory from Emacs Lisp. This can cause some trouble if that hidden working directory happens to be sitting on filesystem you’d like to unmount.

Vim can be configured to simulate Emacs’ model with its autochdir option. When set, Vim will literally chdir(2) each time the user changes buffers, switches windows, etc. To the user, this feels just like Emacs’ model, but this is just a convenience, and the core working directory model is still the same.

Single instance editors

For most of my Emacs career, I’ve stuck to running a single, long-lived Emacs instance no matter how many different tasks I’m touching simultaneously. I start the Emacs daemon shortly after logging in, and it continues running until I log out — typically only when the machine is shut down. It’s common to have multiple Emacs windows (frames) for different tasks, but they’re all bound to the same daemon process.

While with care it’s possible to have a complex, rich Emacs configuration that doesn’t significantly impact Emacs’ startup time, the general consensus is that Emacs is slow to start. But since it has a really solid daemon, this doesn’t matter: hardcore Emacs users only ever start Emacs occasionally. The rest of the time they’re launching emacsclient and connecting to the daemon. Outside of system administration, it’s the most natural way to use Emacs.

The case isn’t so clear for Vim. Vim is so fast that many users fire it up on demand and exit when they’ve finished the immediate task. At the other end of the spectrum, others advocate using a single instance of Vim like running a single Emacs daemon. In my initial dive into Vim, I tried the single-instance, Emacs way of doing things. I set autochdir out of necessity and pretended each buffer had its own working directory.

At least for me, this isn’t the right way to use Vim, and it all comes down to working directories. I want Vim to be anchored at the project root with one Vim instance per project. Everything is smoother when it happens in the context of the project’s root directory, from opening files, to running shell commands (ctags in particular), to invoking the build system. With autochdir, these actions are difficult to do correctly, particularly the last two.

Invoking the build

I suspect the Emacs’ model of per-buffer working directories has, in a Sapir-Whorf sort of way, been responsible for leading developers towards poorly-designed, recursive Makefiles. Without a global concept of working directory, it’s inconvenient to invoke the build system (M-x compile) in some particular grandparent directory that is the root of the project. If each directory has its own Makefile, it usually makes sense to invoke make in the same directory as the file being edited.

Over the years I’ve been reinventing the same solution to this problem, and it wasn’t until I spent time with Vim and its alternate working directory model that I truly understood the problem. Emacs itself has long had a solution lurking deep in its bowels, unseen by daylight: dominating files. The function I’m talking about is locate-dominating-file:

(locate-dominating-file FILE NAME)

Look up the directory hierarchy from FILE for a directory containing NAME. Stop at the first parent directory containing a file NAME, and return the directory. Return nil if not found. Instead of a string, NAME can also be a predicate taking one argument (a directory) and returning a non-nil value if that directory is the one for which we’re looking.

The trouble of invoking the build system at the project root is that Emacs doesn’t really have a concept of a project root. It doesn’t know where it is or how to find it. The vi model inherited by Vim is to leave the working directory at the project root. While Vim can simulate Emacs’ working directory model, Emacs cannot (currently) simulate Vim’s model.

Instead, by identifying a file name unique to the project’s root (i.e. a “dominating” file) such as Makefile or build.xml, then locate-dominating-file can discover the project root. All that’s left is wrapping M-x compile so that default-directory is temporarily adjusted to the project’s root.

That looks very roughly like this (and needs more work):

(defun my-compile ()
  (interactive)
  (let ((default-directory (locate-dominating-file "." "Makefile")))
    (compile "make")))

It’s a pattern I’ve used again and again and again, working against the same old friction. By running one Vim instance per project at the project’s root, I get the correct behavior for free.

A Tutorial on Portable Makefiles

In my first decade writing Makefiles, I developed the bad habit of liberally using GNU Make’s extensions. I didn’t know the line between GNU Make and the portable features guaranteed by POSIX. Usually it didn’t matter much, but it would become an annoyance when building on non-Linux systems, such as on the various BSDs. I’d have to specifically install GNU Make, then remember to invoke it (i.e. as gmake) instead of the system’s make.

I’ve since become familiar and comfortable with make’s official specification, and I’ve spend the last year writing strictly portable Makefiles. Not only has are my builds now portable across all unix-like systems, my Makefiles are cleaner and more robust. Many of the common make extensions — conditionals in particular — lead to fragile, complicated Makefiles and are best avoided anyway. It’s important to be able to trust your build system to do its job correctly.

This tutorial should be suitable for make beginners who have never written their own Makefiles before, as well as experienced developers who want to learn how to write portable Makefiles. Regardless, in order to understand the examples you must be familiar with the usual steps for building programs on the command line (compiler, linker, object files, etc.). I’m not going to suggest any fancy tricks nor provide any sort of standard starting template. Makefiles should be dead simple when the project is small, and grow in a predictable, clean fashion alongside the project.

I’m not going to cover every feature. You’ll need to read the specification for yourself to learn it all. This tutorial will go over the important features as well as the common conventions. It’s important to follow established conventions so that people using your Makefiles will know what to expect and how to accomplish the basic tasks.

If you’re running Debian, or a Debian derivative such as Ubuntu, the bmake and freebsd-buildutils packages will provide the bmake and fmake programs respectively. These alternative make implementations are very useful for testing your Makefiles’ portability, should you accidentally make use of a GNU Make feature. It’s not perfect since each implements some of the same extensions as GNU Make, but it will catch some common mistakes.

What’s in a Makefile?

I am free, no matter what rules surround me. If I find them tolerable, I tolerate them; if I find them too obnoxious, I break them. I am free because I know that I alone am morally responsible for everything I do. ―Robert A. Heinlein

At make’s core are one or more dependency trees, constructed from rules. Each vertex in the tree is called a target. The final products of the build (executable, document, etc.) are the tree roots. A Makefile specifies the dependency trees and supplies the shell commands to produce a target from its prerequisites.

In this illustration, the “.c” files are source files that are written by hand, not generated by commands, so they have no prerequisites. The syntax for specifying one or more edges in this dependency tree is simple:

target [target...]: [prerequisite...]

While technically multiple targets can be specified in a single rule, this is unusual. Typically each target is specified in its own rule. To specify the tree in the illustration above:

game: graphics.o physics.o input.o
graphics.o: graphics.c
physics.o: physics.c
input.o: input.c

The order of these rules doesn’t matter. The entire Makefile is parsed before any actions are taken, so the tree’s vertices and edges can be specified in any order. There’s one exception: the first non-special target in a Makefile is the default target. This target is selected implicitly when make is invoked without choosing a target. It should be something sensible, so that a user can blindly run make and get a useful result.

A target can be specified more than once. Any new prerequisites are appended to the previously-given prerequisites. For example, this Makefile is identical to the previous, though it’s typically not written this way:

game: graphics.o
game: physics.o
game: input.o
graphics.o: graphics.c
physics.o: physics.c
input.o: input.c

There are six special targets that are used to change the behavior of make itself. All have uppercase names and start with a period. Names fitting this pattern are reserved for use by make. According to the standard, in order to get reliable POSIX behavior, the first non-comment line of the Makefile must be .POSIX. Since this is a special target, it’s not a candidate for the default target, so game will remain the default target:

.POSIX:
game: graphics.o physics.o input.o
graphics.o: graphics.c
physics.o: physics.c
input.o: input.c

In practice, even a simple program will have header files, and sources that include a header file should also have an edge on the dependency tree for it. If the header file changes, targets that include it should also be rebuilt.

.POSIX:
game: graphics.o physics.o input.o
graphics.o: graphics.c graphics.h
physics.o: physics.c physics.h
input.o: input.c input.h graphics.h physics.h

Adding commands to rules

We’ve constructed a dependency tree, but we still haven’t told make how to actually build any targets from its prerequisites. The rules also need to specify the shell commands that produce a target from its prerequisites.

If you were to create the source files in the example and invoke make, you will find that it actually does know how to build the object files. This is because make is initially configured with certain inference rules, a topic which will be covered later. For now, we’ll add the .SUFFIXES special target to the top, erasing all the built-in inference rules.

Commands immediately follow the target/prerequisite line in a rule. Each command line must start with a tab character. This can be awkward if your text editor isn’t configured for it, and it will be awkward if you try to copy the examples from this page.

Each line is run in its own shell, so be mindful of using commands like cd, which won’t affect later lines.

The simplest thing to do is literally specify the same commands you’d type at the shell:

.POSIX:
.SUFFIXES:
game: graphics.o physics.o input.o
    cc -o game graphics.o physics.o input.o
graphics.o: graphics.c graphics.h
    cc -c graphics.c
physics.o: physics.c physics.h
    cc -c physics.c
input.o: input.c input.h graphics.h physics.h
    cc -c input.c

Invoking make and choosing targets

I tried to walk into Target, but I missed. ―Mitch Hedberg

When invoking make, it accepts zero or more targets from the dependency tree, and it will build these targets — e.g. run the commands in the target’s rule — if the target is out-of-date. A target is out-of-date if it is older than any of its prerequisites.

# build the "game" binary (default target)
$ make

# build just the object files
$ make graphics.o physics.o input.o

This effect cascades up the dependency tree and causes further targets to be rebuilt until all of the requested targets are up-to-date. There’s a lot of room for parallelism since different branches of the tree can be updated independently. It’s common for make implementations to support parallel builds with the -j option. This is non-standard, but it’s a fantastic feature that doesn’t require anything special in the Makefile to work correctly.

Similar to parallel builds is make’s -k (“keep going”) option, which is standard. This tells make not to stop on the first error, and to continue updating targets that are unaffected by the error. This is nice for fully populating Vim’s quickfix list or Emacs’ compilation buffer.

It’s common to have multiple targets that should be built by default. If the first rule selects the default target, how do we solve the problem of needing multiple default targets? The convention is to use phony targets. These are called “phony” because there is no corresponding file, and so phony targets are never up-to-date. It’s convention for a phony “all” target to be the default target.

I’ll make game a prerequisite of a new “all” target. More real targets could be added as necessary to turn them into defaults. Users of this Makefile will also expect make all to build the entire project.

Another common phony target is “clean” which removes all of the built files. Users will expect make clean to delete all generated files.

.POSIX:
.SUFFIXES:
all: game
game: graphics.o physics.o input.o
    cc -o game graphics.o physics.o input.o
graphics.o: graphics.c graphics.h
    cc -c graphics.c
physics.o: physics.c physics.h
    cc -c physics.c
input.o: input.c input.h graphics.h physics.h
    cc -c input.c
clean:
    rm -f game graphics.o physics.o input.o

Customize the build with macros

So far the Makefile hardcodes cc as the compiler, and doesn’t use any compiler flags (warnings, optimization, hardening, etc.). The user should be able to easily control all these things, but right now they’d have to edit the entire Makefile to do so. Perhaps the user has both gcc and clang installed, and wants to choose one or the other without changing which is installed as cc.

To solve this, make has macros that expand into strings when referenced. The convention is to use the macro named CC when talking about the C compiler, CFLAGS when talking about flags passed to the C compiler, LDFLAGS for flags passed to the C compiler when linking, and LDLIBS for flags about libraries when linking. The Makefile should supply defaults as needed.

A macro is expanded with $(...). It’s valid (and normal) to reference a macro that hasn’t been defined, which will be an empty string. This will be the case with LDFLAGS below.

Macro values can contain other macros, which will be expanded recursively each time the macro is expanded. Some make implementations allow the name of the macro being expanded to itself be a macro, which is turing complete, but this behavior is non-standard.

.POSIX:
.SUFFIXES:
CC     = cc
CFLAGS = -W -O
LDLIBS = -lm

all: game
game: graphics.o physics.o input.o
    $(CC) $(LDFLAGS) -o game graphics.o physics.o input.o $(LDLIBS)
graphics.o: graphics.c graphics.h
    $(CC) -c $(CFLAGS) graphics.c
physics.o: physics.c physics.h
    $(CC) -c $(CFLAGS) physics.c
input.o: input.c input.h graphics.h physics.h
    $(CC) -c $(CFLAGS) input.c
clean:
    rm -f game graphics.o physics.o input.o

Macros are overridden by macro definitions given as command line arguments in the form name=value. This allows the user to select their own build configuration. This is one of make’s most powerful and under-appreciated features.

$ make CC=clang CFLAGS='-O3 -march=native'

If the user doesn’t want to specify these macros on every invocation, they can (cautiously) use make’s -e flag to set overriding macros definitions from the environment.

$ export CC=clang
$ export CFLAGS=-O3
$ make -e all

Some make implementations have other special kinds of macro assignment operators beyond simple assignment (=). These are unnecessary, so don’t worry about them.

Inference rules so that you can stop repeating yourself

The road itself tells us far more than signs do. ―Tom Vanderbilt, Traffic: Why We Drive the Way We Do

There’s repetition across the three different object files. Wouldn’t it be nice if there was a way to communicate this pattern? Fortunately there is, in the form of inference rules. It says that a target with a certain extension, with a prerequisite with another certain extension, is built a certain way. This will make more sense with an example.

In an inference rule, the target indicates the extensions. The $< macro expands to the prerequisite, which is essential to making inference rules work generically. Unfortunately this macro is not available in target rules, as much as that would be useful.

For example, here’s an inference rule that teaches make how to build an object file from a C source file. This particular rule is one that is pre-defined by make, so you’ll never need to write this one yourself. I’ll include it for completeness.

.c.o:
    $(CC) $(CFLAGS) -c $<

These extensions must be added to .SUFFIXES before they will work. With that, the commands for the rules about object files can be omitted.

.POSIX:
.SUFFIXES:
CC     = cc
CFLAGS = -W -O
LDLIBS = -lm

all: game
game: graphics.o physics.o input.o
    $(CC) $(LDFLAGS) -o game graphics.o physics.o input.o $(LDLIBS)
graphics.o: graphics.c graphics.h
physics.o: physics.c physics.h
input.o: input.c input.h graphics.h physics.h
clean:
    rm -f game graphics.o physics.o input.o

.SUFFIXES: .c .o
.c.o:
    $(CC) $(CFLAGS) -c $<

The first empty .SUFFIXES clears the suffix list. The second one adds .c and .o to the now-empty suffix list.

Other target conventions

Conventions are, indeed, all that shield us from the shivering void, though often they do so but poorly and desperately. ―Robert Aickman

Users usually expect an “install” target that installs the built program, libraries, man pages, etc. By convention this target should use the PREFIX and DESTDIR macros.

The PREFIX macro should default to /usr/local, and since it’s a macro the user can override it to install elsewhere, such as in their home directory. The user should override it for both building and installing, since the prefix may need to be built into the binary (e.g. -DPREFIX=$(PREFIX)).

The DESTDIR is macro is used for staged builds, so that it gets installed under a fake root directory for the sake of packaging. Unlike PREFIX, it will not actually be run from this directory.

.POSIX:
CC     = cc
CFLAGS = -W -O
LDLIBS = -lm
PREFIX = /usr/local

all: game
install: game
    mkdir -p $(DESTDIR)$(PREFIX)/bin
    mkdir -p $(DESTDIR)$(PREFIX)/share/man/man1
    cp -f game $(DESTDIR)$(PREFIX)/bin
    gzip < game.1 > $(DESTDIR)$(PREFIX)/share/man/man1/game.1.gz
game: graphics.o physics.o input.o
    $(CC) $(LDFLAGS) -o game graphics.o physics.o input.o $(LDLIBS)
graphics.o: graphics.c graphics.h
physics.o: physics.c physics.h
input.o: input.c input.h graphics.h physics.h
clean:
    rm -f game graphics.o physics.o input.o

You may also want to provide an “uninstall” phony target that does the opposite.

make PREFIX=$HOME/.local install

Other common targets are “mostlyclean” (like “clean” but don’t delete some slow-to-build targets), “distclean” (delete even more than “clean”), “test” or “check” (run the test suite), and “dist” (create a package).

Complexity and growing pains

One of make’s big weak points is scaling up as a project grows in size.

Recursive Makefiles

As your growing project is broken into subdirectories, you may be tempted to put a Makefile in each subdirectory and invoke them recursively.

Don’t use recursive Makefiles. It breaks the dependency tree across separate instances of make and typically results in a fragile build. There’s nothing good about it. Have one Makefile at the root of your project and invoke make there. You may have to teach your text editor how to do this.

When talking about files in subdirectories, just include the subdirectory in the name. Everything will work the same as far as make is concerned, including inference rules.

src/graphics.o: src/graphics.c
src/physics.o: src/physics.c
src/input.o: src/input.c

Out-of-source builds

Keeping your object files separate from your source files is a nice idea. When it comes to make, there’s good news and bad news.

The good news is that make can do this. You can pick whatever file names you like for targets and prerequisites.

obj/input.o: src/input.c

The bad news is that inference rules are not compatible with out-of-source builds. You’ll need to repeat the same commands for each rule as if inference rules didn’t exist. This is tedious for large projects, so you may want to have some sort of “configure” script, even if hand-written, to generate all this for you. This is essentially what CMake is all about. That, plus dependency management.

Dependency management

Another problem with scaling up is tracking the project’s ever-changing dependencies across all the source files. Missing a dependency means the build may not be correct unless you make clean first.

If you go the route of using a script to generate the tedious parts of the Makefile, both GCC and Clang have a nice feature for generating all the Makefile dependencies for you (-MM, -MT), at least for C and C++. There are lots of tutorials for doing this dependency generation on the fly as part of the build, but it’s fragile and slow. Much better to do it all up front and “bake” the dependencies into the Makefile so that make can do its job properly. If the dependencies change, rebuild your Makefile.

For example, here’s what it looks like invoking gcc’s dependency generator against the imaginary input.c for an out-of-source build:

$ gcc $CFLAGS -MM -MT '$(BUILD)/input.o' input.c
$(BUILD)/input.o: input.c input.h graphics.h physics.h

Notice the output is in Makefile’s rule format.

Unfortunately this feature strips the leading paths from the target, so, in practice, using it is always more complicated than it should be (e.g. it requires the use of -MT).

Microsoft’s Nmake

Microsoft has an implementation of make called Nmake, which comes with Visual Studio. It’s nearly a POSIX-compatible make, but necessarily breaks from the standard in some places. Their cl.exe compiler uses .obj as the object file extension and .exe for binaries, both of which differ from the unix world, so it has different built-in inference rules. Windows also lacks a Bourne shell and the standard unix tools, so all of the commands will necessarily be different.

There’s no equivalent of rm -f on Windows, so good luck writing a proper “clean” target. No, del /f isn’t the same.

So while it’s close to POSIX make, it’s not practical to write a Makefile that will simultaneously work properly with both POSIX make and Nmake. These need to be separate Makefiles.

May your Makefiles be portable

It’s nice to have reliable, portable Makefiles that just work anywhere. Code to the standards and you don’t need feature tests or other sorts of special treatment.

Introducing the Pokerware Secure Passphrase Generator

I recently developed Pokerware, an offline passphrase generator that operates in the same spirit as Diceware. The primary difference is that it uses a shuffled deck of playing cards as its entropy source rather than dice. Draw some cards and use them to select a uniformly random word from a list. Unless you’re some sort of tabletop gaming nerd, a deck of cards is more readily available than five 6-sided dice, which would typically need to be borrowed from the Monopoly board collecting dust on the shelf, then rolled two at a time.

There are various flavors of two different word lists here:

Hardware random number generators are difficult to verify and may not actually be as random as they promise, either intentionally or unintentionally. For the particularly paranoid, Diceware and Pokerware are an easily verifiable alternative for generating secure passphrases for cryptographic purposes. At any time, a deck of 52 playing cards is in one of 52! possible arrangements. That’s more than 225 bits of entropy. If you give your deck a thorough shuffle, it will be in an arrangement that has never been seen before and will never be seen again. Pokerware draws on some of these bits to generate passphrases.

The Pokerware list has 5,304 words (12.4 bits per word), compared to Diceware’s 7,776 words (12.9 bits per word). My goal was to invent a card-drawing scheme that would uniformly select from a list in the same sized ballpark as Diceware. Much smaller and you’d have to memorize more words for the same passphrase strength. Much larger and the words on the list would be more difficult to memorize, since the list would contain longer and less frequently used words. Diceware strikes a nice balance at five dice.

One important difference for me is that I like my Pokerware word lists a lot more than the two official Diceware lists. My lists only have simple, easy-to-remember words (for American English speakers, at least), without any numbers or other short non-words. Pokerware has two official lists, “formal” and “slang,” since my early testers couldn’t agree on which was better. Rather than make a difficult decision, I took the usual route of making no decision at all.

The “formal” list is derived in part from Google’s Ngram Viewer, with my own additional filters and tweaking. It’s called “formal” because the ngrams come from formal publications and represent more formal kinds of speech.

The “slang” list is derived from every reddit comment between December 2005 and May 2017, tamed by the same additional filters. I have this data on hand, so I may as well put it to use. I figured more casually-used words would be easier to remember. Due to my extra filtering, there’s actually a lot of overlap between these lists, so the differences aren’t too significant.

If you have your own word list, perhaps in a different language, you can use the Makefile in the repository to build your own Pokerware lookup table, both plain text and PDF. The PDF is generated using Groff macros.

Passphrase generation instructions

  1. Thoroughly shuffle the deck.

  2. Draw two cards. Sort them by value, then suit. Suits are in alphabetical order: Clubs, Diamonds, Hearts, Spades.

  3. Draw additional cards until you get a card that doesn’t match the face value of either of your initial two cards. Observe its suit.

  4. Using your two cards and observed suit, look up a word in the table.

  5. Place all cards back in the deck, shuffle, and repeat from step 2 until you have the desired number of words. Each word is worth 12.4 bits of entropy.

A word of warning about step 4: If you use software to do the word list lookup, beware that it might save your search/command history — and therefore your passphrase — to a file. For example, the less pager will store search history in ~/.lesshst. It’s easy to prevent that one:

$ LESSHISTFILE=- less pokerware-slang.txt

Example word generation

Suppose in step 2 you draw King of Hearts (KH/K♥) and Queen of Clubs (QC/Q♣).

In step 3 you first draw King of Diamonds (KD/K♦), discarding it because it matches the face value of one of your cards from step 2.

Next you draw Four of Spades (4S/4♠), taking spades as your extra suit.

In order, this gives you Queen of Clubs, King of Hearts, and Spades: QCKHS or Q♣K♥♠. This corresponds to “wizard” in the formal word list and would be the first word in your passphrase.

A deck of cards as an office tool

I now have an excuse to keep a deck of cards out on my desk at work. I’ve been using Diceware — or something approximating it since I’m not so paranoid about hardware RNGs — for passwords for over 8 years now. From now I’ll deal new passwords from an in-reach deck of cards. Though typically I need to tweak the results to meet outdated character-composition requirements.

Integer Overflow into Information Disclosure

Last week I was discussing CVE-2017-7529 with my intern. Specially crafted input to Nginx causes an integer overflow which has the potential to leak sensitive information. But how could an integer overflow be abused to trick a program into leaking information? To answer this question, I put together the simplest practical example I could imagine.

This small C program converts a vector image from a custom format (described below) into a Netpbm image, a conveniently simple format. The program defensively and carefully parses its input, but still makes a subtle, fatal mistake. This mistake not only leads to sensitive information disclosure, but, with a more sophisticated attack, could be used to execute arbitrary code.

After getting the hang of the interface for the program, I encourage you to take some time to work out an exploit yourself. Regardless, I’ll reveal a functioning exploit and explain how it works.

A new vector format

The input format is line-oriented and very similar to Netpbm itself. The first line is the header, starting with the magic number V2 (ASCII) followed by the image dimensions. The target output format is Netpbm’s “P2” (text gray scale) format, so the “V2” parallels it. The file must end with a newline.

V2 <width> <height>

What follows is drawing commands, one per line. For example, the s command sets the value of a particular pixel.

s <x> <y> <00–ff>

Since it’s not important for the demonstration, this is the only command I implemented. It’s easy to imagine additional commands to draw lines, circles, Bezier curves, etc.

Here’s an example (example.txt) that draws a single white point in the middle of the image:

V2 256 256
s 127 127 ff

The rendering tool reads standard input to standard output:

$ render < example.txt > example.pgm

Here’s what it looks like rendered:

However, you will notice that when you run the rendering tool, it prompts you for username and password. This is silly, of course, but it’s an excuse to get “sensitive” information into memory. It will accept any username/password combination where the username and password don’t match each other. The key is this: It’s possible to craft a valid image that leaks the the entered password.

Tour of the implementation

Without spoiling anything yet, let’s look at how this program works. The first thing to notice is that I’m using a custom “obstack” allocator instead of malloc() and free(). Real-world allocators have some defenses against this particular vulnerability. Plus a specific exploit would have to target a specific libc. By using my own allocator, the exploit will mostly be portable, making for a better and easier demonstration.

The allocator interface should be pretty self-explanatory, except for two details. This is an obstack allocator, so freeing an object also frees every object allocated after it. Also, it doesn’t call malloc() in the background. At initialization you give it a buffer from which to allocate all memory.

struct mstack {
    char *top;
    char *max;
    char buf[];
};

struct mstack *mstack_init(void *, size_t);
void          *mstack_alloc(struct mstack *, size_t);
void           mstack_free(struct mstack *, void *);

There are no vulnerabilities in these functions (I hope!). It’s just here for predictability.

Next here’s the “authentication” function. It reads a username and password combination from /dev/tty. It’s only an excuse to get a flag in memory for this capture-the-flag game. The username and password must be less than 32 characters each.

int
authenticate(struct mstack *m)
{
    FILE *tty = fopen("/dev/tty", "r+");
    if (!tty) {
        perror("/dev/tty");
        return 0;
    }

    char *user = mstack_alloc(m, 32);
    if (!user) {
        fclose(tty);
        return 0;
    }
    fputs("User: ", tty);
    fflush(tty);
    if (!fgets(user, 32, tty))
        user[0] = 0;

    char *pass = mstack_alloc(m, 32);
    int result = 0;
    if (pass) {
        fputs("Password: ", tty);
        fflush(tty);
        if (fgets(pass, 32, tty))
            result = strcmp(user, pass) != 0;
    }

    fclose(tty);
    mstack_free(m, user);
    return result;
}

Next here’s a little version of calloc() for the custom allocator. Hmm, I wonder why is this called “naive” …

void *
naive_calloc(struct mstack *m, unsigned long nmemb, unsigned long size)
{
    void *p = mstack_alloc(m, nmemb * size);
    if (p)
        memset(p, 0, nmemb * size);
    return p;
}

Next up is a paranoid wrapper for strtoul() that defensively checks its inputs. If it’s out of range of an unsigned long, it bails out. If there’s trailing garbage, it bails out. If there’s no number at all, it bails out. If you make prolonged eye contact, it bails out.

unsigned long
safe_strtoul(char *nptr, char **endptr, int base)
{
    errno = 0;
    unsigned long n = strtoul(nptr, endptr, base);
    if (errno) {
        perror(nptr);
        exit(EXIT_FAILURE);
    } else if (nptr == *endptr) {
        fprintf(stderr, "Expected an integer\n");
        exit(EXIT_FAILURE);
    } else if (!isspace(**endptr)) {
        fprintf(stderr, "Invalid character '%c'\n", **endptr);
        exit(EXIT_FAILURE);
    }
    return n;
}

The main() function parses the header using this wrapper and allocates some zeroed memory:

    unsigned long width = safe_strtoul(p, &p, 10);
    unsigned long height = safe_strtoul(p, &p, 10);
    unsigned char *pixels = naive_calloc(m, width, height);
    if (!pixels) {
        fputs("Not enough memory\n", stderr);
        exit(EXIT_FAILURE);
    }

Then there’s a command processing loop, also using safe_strtoul(). It carefully checks bounds against width and height. Finally it writes out a Netpbm, P2 (.pgm) format.

    printf("P2\n%ld %ld 255\n", width, height);
    for (unsigned long y = 0; y < height; y++) {
        for (unsigned long x = 0; x < width; x++)
            printf("%d ", pixels[y * width + x]);
        putchar('\n');
    }

The vulnerability is in something I’ve shown above. Can you find it?

Exploiting the renderer

Did you find it? If you’re on a platform with 64-bit long, here’s your exploit:

V2 16 1152921504606846977

And here’s an exploit for 32-bit long:

V2 16 268435457

Here’s how it looks in action. The most obvious result is that the program crashes:

$ echo V2 16 1152921504606846977 | ./mstack > capture.txt
User: coolguy
Password: mysecret
Segmentation fault

Here are the initial contents of capture.txt:

P2
16 1152921504606846977 255
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
109 121 115 101 99 114 101 116 10 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Where did those junk numbers come from in the image data? Plug them into an ASCII table and you’ll get “mysecret”. Despite allocating the image with naive_calloc(), the password has found its way into the image! How could this be?

What happened is that width * height overflows an unsigned long. (Well, technically speaking, unsigned integers are defined not to overflow in C, wrapping around instead, but it’s really the same thing.) In naive_calloc(), the overflow results in a value of 16, so it only allocates and clears 16 bytes. The requested allocation “succeeds” despite far exceeding the available memory. The caller has been given a lot less memory than expected, and the memory believed to have been allocated contains a password.

The final part that writes the output doesn’t multiply the integers and doesn’t need to test for overflow. It uses a nested loop instead, continuing along with the original, impossible image size.

How do we fix this? Add an overflow check at the beginning of the naive_calloc() function (making it no longer naive). This is what the real calloc() does.

    if (nmemb && size > -1UL / nmemb)
        return 0;

The frightening takeaway is that this check is very easy to forget. It’s a subtle bug with potentially disastrous consequences.

In practice, this sort of program wouldn’t have sensitive data resident in memory. Instead an attacker would target the program’s stack with those s commands — specifically the return pointers — and perform a ROP attack against the application. With the exploit header above and a platform where long the same size as a size_t, the program will behave as if all available memory has been allocated to the image, so the s command could be used to poke custom values anywhere in memory. This is a much more complicated exploit, and it has to contend with ASLR and random stack gap, but it’s feasible.

Rolling Shutter Simulation in C

The most recent Smarter Every Day (#172) explains a phenomenon that results from rolling shutter. You’ve likely seen this effect in some of your own digital photographs. When a CMOS digital camera captures a picture, it reads one row of the sensor at a time. If the subject of the picture is a fast-moving object (relative to the camera), then the subject will change significantly while the image is being captured, giving strange, unreal results:

In the Smarter Every Day video, Destin illustrates the effect by simulating rolling shutter using a short video clip. In each frame of the video, a few additional rows are locked in place, showing the effect in slow motion, making it easier to understand.

At the end of the video he thanks a friend for figuring out how to get After Effects to simulate rolling shutter. After thinking about this for a moment, I figured I could easily accomplish this myself with just a bit of C, without any libraries. The video above this paragraph is the result.

I previously described a technique to edit and manipulate video without any formal video editing tools. A unix pipeline is sufficient for doing minor video editing, especially without sound. The program at the front of the pipe decodes the video into a raw, uncompressed format, such as YUV4MPEG or PPM. The tools in the middle losslessly manipulate this data to achieve the desired effect (watermark, scaling, etc.). Finally, the tool at the end encodes the video into a standard format.

$ decode video.mp4 | xform-a | xform-b | encode out.mp4

For the “decode” program I’ll be using ffmpeg now that it’s back in the Debian repositories. You can throw a video in virtually any format at it and it will write PPM frames to standard output. For the encoder I’ll be using the x264 command line program, though ffmpeg could handle this part as well. Without any filters in the middle, this example will just re-encode a video:

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    x264 -o output.mp4 /dev/stdin

The filter tools in the middle only need to read and write in the raw image format. They’re a little bit like shaders, and they’re easy to write. In this case, I’ll write C program that simulates rolling shutter. The filter could be written in any language that can read and write binary data from standard input to standard output.

Update: It appears that input PPM streams are a rather recent feature of libavformat (a.k.a lavf, used by x264). Support for PPM input first appeared in libavformat 3.1 (released June 26th, 2016). If you’re using an older version of libavformat, you’ll need to stick ppmtoy4m in front of x264 in the processing pipeline.

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    ppmtoy4m | \
    x264 -o output.mp4 /dev/stdin

Video filtering in C

In the past, my go to for raw video data has been loose PPM frames and YUV4MPEG streams (via ppmtoy4m). Fortunately, over the years a lot of tools have gained the ability to manipulate streams of PPM images, which is a much more convenient format. Despite being raw video data, YUV4MPEG is still a fairly complex format with lots of options and annoying colorspace concerns. PPM is simple RGB without complications. The header is just text:

P6
<width> <height>
<maxdepth>
<width * height * 3 binary RGB data>

The maximum depth is virtually always 255. A smaller value reduces the image’s dynamic range without reducing the size. A larger value involves byte-order issues (endian). For video frame data, the file will typically look like:

P6
1920 1080
255
<frame RGB>

Unfortunately the format is actually a little more flexible than this. Except for the new line (LF, 0x0A) after the maximum depth, the whitespace is arbitrary and comments starting with # are permitted. Since the tools I’m using won’t produce comments, I’m going to ignore that detail. I’ll also assume the maximum depth is always 255.

Here’s the structure I used to represent a PPM image, just one frame of video. I’m using a flexible array member to pack the data at the end of the structure.

struct frame {
    size_t width;
    size_t height;
    unsigned char data[];
};

Next a function to allocate a frame:

static struct frame *
frame_create(size_t width, size_t height)
{
    struct frame *f = malloc(sizeof(*f) + width * height * 3);
    f->width = width;
    f->height = height;
    return f;
}

We’ll need a way to write the frames we’ve created.

static void
frame_write(struct frame *f)
{
    printf("P6\n%zu %zu\n255\n", f->width, f->height);
    fwrite(f->data, f->width * f->height, 3, stdout);
}

Finally, a function to read a frame, reusing an existing buffer if possible. The most complex part of the whole program is just parsing the PPM header. The %*c in the scanf() specifically consumes the line feed immediately following the maximum depth.

static struct frame *
frame_read(struct frame *f)
{
    size_t width, height;
    if (scanf("P6 %zu%zu%*d%*c", &width, &height) < 2) {
        free(f);
        return 0;
    }
    if (!f || f->width != width || f->height != height) {
        free(f);
        f = frame_create(width, height);
    }
    fread(f->data, width * height, 3, stdin);
    return f;
}

Since this program will only be part of a pipeline, I’m not worried about checking the results of fwrite() and fread(). The process will be killed by the shell if something goes wrong with the pipes. However, if we’re out of video data and get an EOF, scanf() will fail, indicating the EOF, which is normal and can be handled cleanly.

An identity filter

That’s all the infrastructure we need to built an identity filter that passes frames through unchanged:

int main(void)
{
    struct frame *frame = 0;
    while ((frame = frame_read(frame)))
        frame_write(frame);
}

Processing a frame is just matter of adding some stuff to the body of the while loop.

A rolling shutter filter

For the rolling shutter filter, in addition to the input frame we need an image to hold the result of the rolling shutter. Each input frame will be copied into the rolling shutter frame, but a little less will be copied from each frame, locking a little bit more of the image in place.

int
main(void)
{
    int shutter_step = 3;
    size_t shutter = 0;
    struct frame *f = frame_read(0);
    struct frame *out = frame_create(f->width, f->height);
    while (shutter < f->height && (f = frame_read(f))) {
        size_t offset = shutter * f->width * 3;
        size_t length = f->height * f->width * 3 - offset;
        memcpy(out->data + offset, f->data + offset, length);
        frame_write(out);
        shutter += shutter_step;
    }
    free(out);
    free(f);
}

The shutter_step controls how many rows are capture per frame of video. Generally capturing one row per frame is too slow for the simulation. For a 1080p video, that’s 1,080 frames for the entire simulation: 18 seconds at 60 FPS or 36 seconds at 30 FPS. If this program were to accept command line arguments, controlling the shutter rate would be one of the options.

Putting it all together:

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    ./rolling-shutter | \
    x264 -o output.mp4 /dev/stdin

Here are some of the results for different shutter rates: 1, 3, 5, 8, 10, and 15 rows per frame. Feel free to right-click and “View Video” to see the full resolution video.

Source and original input

This post contains the full source in parts, but here it is all together:

Here’s the original video, filmed by my wife using her Nikon D5500, in case you want to try it for yourself:

It took much longer to figure out the string-pulling contraption to slowly spin the fan at a constant rate than it took to write the C filter program.

On Hacker News, morecoffee shared a video of the second order effect (direct link), where the rolling shutter speed changes over time.

A deeper analysis of rolling shutter: Playing detective with rolling shutter photos.

null program

Chris Wellons