The Day I Fell in Love with Fuzzing

This article was discussed on Hacker News and on reddit.

In 2007 I wrote a pair of modding tools, binitools, for a space trading and combat simulation game named Freelancer. The game stores its non-art assets in the format of “binary INI” files, or “BINI” files. The motivation for the binary format over traditional INI files was probably performance: it’s faster to load and read these files than it is to parse arbitrary text in INI format.

Much of the in-game content can be changed simply by modifying these files — changing time names, editing commodity prices, tweaking ship statistics, or even adding new ships to the game. The binary nature makes them unsuitable to in-place modification, so the natural approach is to convert them to text INI files, make the desired modifications using a text editor, then convert back to the BINI format and replace the file in the game’s installation.

I didn’t reverse engineer the BINI format, nor was I the first person the create tools to edit them. The existing tools weren’t to my tastes, and I had my own vision for how they should work — an interface more closely following the Unix tradition despite the target being a Windows game.

When I got started, I had just learned how to use yacc (really Bison) and lex (really flex), as well as Autoconf, so I went all-in with these newly-discovered tools. It was exciting to try them out in a real-world situation, though I slavishly aped the practices of other open source projects without really understanding why things were they way they were. Due to the use of yacc/lex and the configure script build, compiling the project required a full, Unix-like environment. This is all visible in the original version of the source.

The project was moderately successful in two ways. First, I was able to use the tools to modify the game. Second, other people were using the tools, since the binaries I built show up in various collections of Freelancer modding tools online.

The Rewrite

That’s the way things were until mid-2018 when I revisited the project. Ever look at your own old code and wonder what they heck you were thinking? My INI format was far more rigid and strict than necessary, I was doing questionable things when writing out binary data, and the build wasn’t even working correctly.

With an additional decade of experience under my belt, I knew I could do way better if I were to rewrite these tools today. So, over the course of a few days, I did, from scratch. That’s what’s visible in the master branch today.

I like to keep things simple which meant no more Autoconf, and instead a simple, portable Makefile. No more yacc or lex, and instead a hand-coded parser. Using only conforming, portable C. The result was so simple that I can build using Visual Studio in a single, short command, so the Makefile isn’t all that necessary. With one small tweak (replace stdint.h with a typedef), I can even build and run binitools in DOS.

The new version is faster, leaner, cleaner, and simpler. It’s far more flexible about its INI input, so its easier to use. But is it more correct?

Fuzzing

I’ve been interested in fuzzing for years, especially american fuzzy lop, or afl. However, I wasn’t having success with it. I’d fuzz some of the tools I use regularly, and it wouldn’t find anything of note, at least not before I gave up. I fuzzed my JSON library, and somehow it turned up nothing. Surely my JSON parser couldn’t be that robust already, could it? Fuzzing just wasn’t accomplishing anything for me. (As it turns out, my JSON library is quite robust, thanks in large part to various contributors!)

So I’ve got this relatively new INI parser, and while it can successfully parse and correctly re-assemble the game’s original set of BINI files, it hasn’t really been exercised that much. Surely there’s something in here for a fuzzer to find. Plus I don’t even have to write a line of code in order to run afl against it. The tools already read from standard input by default, which is perfect.

Assuming you’ve got the necessary tools installed (make, gcc, afl), here’s how easy it is to start fuzzing binitools:

$ make CC=afl-gcc
$ mkdir in out
$ echo '[x]' > in/empty
$ afl-fuzz -i in -o out -- ./bini

The bini utility takes INI as input and produces BINI as output, so it’s far more interesting to fuzz than its inverse, unbini. Since unbini parses relatively simple binary data, there are (probably) no bugs for the fuzzer to find. I did try anyway just in case.

In my example above, I swapped out the default compiler for afl’s GCC wrapper (CC=afl-gcc). It calls GCC in the background, but in doing so adds its own instrumentation to the binary. When fuzzing, afl-fuzz uses that instrumentation to monitor the program’s execution path. The afl whitepaper explains the technical details.

I also created input and output directories, placing a minimal, working example into the input directory, which gives afl a starting point. As afl runs, it mutates a queue of inputs and observes the changes on the program’s execution. The output directory contains the results and, more importantly, a corpus of inputs that cause unique execution paths. In other words, the fuzzer output will be lots of inputs that exercise many different edge cases.

The most exciting and dreaded result is a crash. The first time I ran it against binitools, bini had many such crashes. Within minutes, afl was finding a number of subtle and interesting bugs in my program, which was incredibly useful. It even discovered an unlikely stale pointer bug by exercising different orderings for various memory allocations. This particular bug was the turning point that made me realize the value of fuzzing.

Not all the bugs it found led to crashes. I also combed through the outputs to see what sorts of inputs were succeeding, what was failing, and observe how my program handled various edge cases. It was rejecting some inputs I thought should be valid, accepting some I thought should be invalid, and interpreting some in ways I hadn’t intended. So even after I fixed the crashing inputs, I still made tweaks to the parser to fix each of these troublesome inputs.

Building a test suite

Once I combed out all the fuzzer-discovered bugs, and I agreed with the parser on how all the various edge cases should be handled, I turned the fuzzer’s corpus into a test suite — though not directly.

I had run the fuzzer in parallel — a process that is explained in the afl documentation — so I had lots of redundant inputs. By redundant I mean that the inputs are different but have the same execution path. Fortunately afl has a tool to deal with this: afl-cmin, the corpus minimization tool. It eliminates all the redundant inputs.

Second, many of these inputs were longer than necessary in order to invoke their unique execution path. There’s afl-tmin, the test case minimizer, which I used to further shrink my test corpus.

I sorted the valid from invalid inputs and checked them into the repository. Have a look at all the wacky inputs invented by the fuzzer starting from my single, minimal input:

This essentially locks down the parser, and the test suite ensures a particular build behaves in a very specific way. This is most useful for ensuring that builds on other platforms and by other compilers are indeed behaving identically with respect to their outputs. My test suite even revealed a bug in diet libc, as binitools doesn’t pass the tests when linked against it. If I were to make non-trivial changes to the parser, I’d essentially need to scrap the current test suite and start over, having afl generate an entire new corpus for the new parser.

Fuzzing has certainly proven itself to be a powerful technique. It found a number of bugs that I likely wouldn’t have otherwise discovered on my own. I’ve since gotten more savvy on its use and have used it on other software — not just software I’ve written myself — and discovered more bugs. It’s got a permanent slot on my software developer toolbelt.

A JavaScript Typed Array Gotcha

JavaScript’s prefix increment and decrement operators can be surprising when applied to typed arrays. It caught be by surprise when I was porting some C code over to JavaScript Just using your brain to execute this code, what do you believe is the value of r?

let array = new Uint8Array([255]);
let r = ++array[0];

The increment and decrement operators originated in the B programming language. Its closest living relative today is C, and, as far as these operators are concered, C can be considered an ancestor of JavaScript. So what is the value of r in this similar C code?

uint8_t array[] = {255};
int r = ++array[0];

Of course, if they were the same then there would be nothing to write about, so that should make it easier to guess if you aren’t sure. The answer: In JavaScript, r is 256. In C, r is 0.

What happened to me was that I wrote an 80-bit integer increment routine in C like this:

uint8_t array[10];
/* ... */
for (int i = 9; i >= 0; i--)
    if (++array[i])
        break;

But I was getting the wrong result over in JavaScript from essentially the same code:

let array = new Uint8Array(10);
/* ... */
for (let i = 9; i >= 0; i--)
    if (++array[i])
        break;

So what’s going on here?

JavaScript specification

The ES5 specification says this about the prefix increment operator:

Let expr be the result of evaluating UnaryExpression.

  1. Throw a SyntaxError exception if the following conditions are all true: [omitted]

  2. Let oldValue be ToNumber(GetValue(expr)).

  3. Let newValue be the result of adding the value 1 to oldValue, using the same rules as for the + operator (see 11.6.3).

  4. Call PutValue(expr, newValue).

Return newValue.

So, oldValue is 255. This is a double precision float because all numbers in JavaScript (outside of the bitwise operations) are double precision floating point. Add 1 to this value to get 256, which is newValue. When newValue is stored in the array via PutValue(), it’s converted to an unsigned 8-bit integer, which truncates it to 0.

However, newValue is returned, not the value that was actually stored in the array!

Since JavaScript is dynamically typed, this difference did not actually matter until typed arrays are involved. I suspect if typed arrays were in JavaScript from the beginning, the specified behavior would be more in line with C.

This behavior isn’t limited to the prefix operators. Consider assignment:

let array = new Uint8Array([255]);
let r = (array[0] = array[0] + 1);
let s = (array[0] += 1);

Both r and s will still be 256. The result of the assignment operators is a similar story:

LeftHandSideExpression = AssignmentExpression is evaluated as follows:

  1. Let lref be the result of evaluating LeftHandSideExpression.

  2. Let rref be the result of evaluating AssignmentExpression.

  3. Let rval be GetValue(rref).

  4. Throw a SyntaxError exception if the following conditions are all true: [omitted]

  5. Call PutValue(lref, rval).

  6. Return rval.

Again, the result of the expression is independent of how it was stored with PutValue().

C specification

I’ll be referencing the original C89/C90 standard. The C specification requires a little more work to get to the bottom of the issue. Starting with 3.3.3.1 (Prefix increment and decrement operators):

The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).

Later in 3.3.16.2 (Compound assignment):

A compound assignment of the form E1 op = E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.

Then finally in 3.3.16 (Assignment operators):

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

So the result is explicitly the value after assignment. Let’s look at this step by step after rewriting the expression.

int r = (array[0] = array[0] + 1);

In C, all integer operations are performed with at least int precision. Smaller integers are implicitly promoted to int before the operation. The value of array[0] is 255, and, since uint8_t is smaller than int, it gets promoted to int. Additionally, the literal constant 1 is also an int, so there are actually two reasons for this promotion.

So since these are int values, the result of the addition is 256, like in JavaScript. To store the result, this value is then demoted to uint8_t and truncated to 0. Finally, this post-assignment 0 is the result of the expression, not the right-hand result as in JavaScript.

Specifications are useful

These situations are why I prefer programming languages that have a formal and approachable specification. If there’s no specification and I’m observing undocumented, idiosyncratic behavior, is this just some subtle quirk of the current implementation — e.g. something that might change without notice in the future — or is it intended behavior that I can rely upon for correctness?

A Survey of $RANDOM

Most Bourne shell clones support a special RANDOM environment variable that evaluates to a random value between 0 and 32,767 (e.g. 15 bits). Assigment to the variable seeds the generator. This variable is an extension and did not appear in the original Unix Bourne shell. Despite this, the different Bourne-like shells that implement it have converged to the same interface, but only the interface. Each implementation differs in interesting ways. In this article we’ll explore how $RANDOM is implemented in various Bourne-like shells.

Unfortunately I was unable to determine the origin of $RANDOM. Nobody was doing a good job tracking source code changes before the mid-1990s, so that history appears to be lost. Bash was first released in 1989, but the earliest version I could find was 1.14.7, released in 1996. KornShell was first released in 1983, but the earliest source I could find was from 1993. In both cases $RANDOM already existed. My guess is that it first appeared in one of these two shells, probably KornShell.

Update: Quentin Barnes has informed me that his 1986 copy of KornShell (a.k.a. ksh86) implements $RANDOM. This predates Bash and makes it likely that this feature originated in KornShell.

Bash

Of all the shells I’m going to discuss, Bash has the most interesting history. It never made use use of srand(3) / rand(3) and instead uses its own generator — which is generally what I prefer. Prior to Bash 4.0, it used the crummy linear congruential generator (LCG) found in the C89 standard:

static unsigned long rseed = 1;

static int
brand ()
{
  rseed = rseed * 1103515245 + 12345;
  return ((unsigned int)((rseed >> 16) & 32767));
}

For some reason it was naïvely decided that $RANDOM should never produce the same value twice in a row. The caller of brand() filters the output and discards repeats before returning to the shell script. This actually reduces the quality of the generator further since it increases correlation between separate outputs.

When the shell starts up, rseed is seeded from the PID and the current time in seconds. These values are literally summed and used as the seed.

/* Note: not the literal code, but equivalent. */
rseed = getpid() + time(0);

Subshells, which fork and initally share an rseed, are given similar treatment:

rseed = rseed + getpid() + time(0);

Notice there’s no hashing or mixing of these values, so there’s no avalanche effect. That would have prevented shells that start around the same time from having related initial random sequences.

With Bash 4.0, released in 2009, the algorithm was changed to a Park–Miller multiplicative LCG from 1988:

static int
brand ()
{
  long h, l;

  /* can't seed with 0. */
  if (rseed == 0)
    rseed = 123459876;
  h = rseed / 127773;
  l = rseed % 127773;
  rseed = 16807 * l - 2836 * h;
  return ((unsigned int)(rseed & 32767));
}

There’s actually a subtle mistake in this implementation compared to the generator described in the paper. This function will generate different numbers than the paper, and it will generate different numbers on different hosts! More on that later.

This algorithm is a much better choice than the previous LCG. There were many more options available in 2009 compared to 1989, but, honestly, this generator is pretty reasonable for this application. Bash is so slow that you’re never practically going to generate enough numbers for the small state to matter. Since the Park–Miller algorithm is older than Bash, they could have used this in the first place.

I considered submitting a patch to switch to something more modern. However, given Bash’s constraints, it’s harder said than done. Portability to weird systems is still a concern, and I expect they’d reject a patch that started making use of long long in the PRNG. They still support pre-ANSI C compilers that don’t have 64-bit arithmetic.

However, what still really could be improved is seeding. In Bash 4.x here’s what it looks like:

static void
seedrand ()
{
  struct timeval tv;

  gettimeofday (&tv, NULL);
  sbrand (tv.tv_sec ^ tv.tv_usec ^ getpid ());
}

Seeding is both better and worse. It’s better that it’s seeded from a higher resolution clock (milliseconds), so two shells started close in time have more variation. However, it’s “mixed” with XOR, which, in this case, is worse than addition.

For example, imagine two Bash shells started one millsecond apart. Both tv_usec and getpid() are incremented by one. Those increments are likely to cancel each other out by an XOR, and they end up with the same seed.

Instead, each of those quantities should be hashed before mixing. Here’s a rough example using my triple32() hash (adapted to glorious GNU-style pre-ANSI C):

static unsigned long
triple32 (x)
     unsigned long x;
{
  x ^= x >> 17;
  x *= 0xed5ad4bbUL;
  x &= 0xffffffffUL;
  x ^= x >> 11;
  x *= 0xac4c1b51UL;
  x &= 0xffffffffUL;
  x ^= x >> 15;
  x *= 0x31848babUL;
  x &= 0xffffffffUL;
  x ^= x >> 14;
  return x;
}

static void
seedrand ()
{
  struct timeval tv;

  gettimeofday (&tv, NULL);
  sbrand (hash32 (tv.tv_sec) ^
          hash32 (hash32 (tv.tv_usec) ^ getpid ()));
}

I had said there’s there’s a mistake in the Bash implementation of Park–Miller. Take a closer look at the types and the assignment to rseed:

  /* The variables */
  long h, l;
  unsigned long rseed;

  /* The assignment */
  rseed = 16807 * l - 2836 * h;

The result of the substraction can be negative, and that negative value is converted to unsigned long. The C standard says ULONG_MAX + 1 is added to make the value positive. ULONG_MAX varies by platform — typicially long is either 32 bits or 64 bits — so the results also vary. Here’s how the paper defined it:

  unsigned long test;

  test = 16807 * l - 2836 * h;
  if (test > 0)
    rseed = test;
  else
    rseed = test + 2147483647;

As far as I can tell, this mistake doesn’t hurt the quality of the generator.

$ 32/bash -c 'RANDOM=127773; echo $RANDOM $RANDOM'
29932 13634

$ 64/bash -c 'RANDOM=127773; echo $RANDOM $RANDOM'
29932 29115

Zsh

In contrast to Bash, Zsh is the most straightforward: defer to rand(3). Its $RANDOM can return the same value twice in a row, assuming that rand(3) does.

zlong
randomgetfn(UNUSED(Param pm))
{
    return rand() & 0x7fff;
}

void
randomsetfn(UNUSED(Param pm), zlong v)
{
    srand((unsigned int)v);
}

A cool feature is that means you could override it if you wanted with a custom generator.

int
rand(void)
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

Usage:

$ gcc -shared -fPIC -o rand.so rand.c
$ LD_PRELOAD=./rand.so zsh -c 'echo $RANDOM $RANDOM $RANDOM'
4 4 4

This trick also applies to the rest of the shells below.

KornShell (ksh)

KornShell originated in 1983, but it was finally released under an open source license in 2005. There’s a clone of KornShell called Public Domain Korn Shell (pdksh) that’s been forked a dozen different ways, but I’ll get to that next.

KornShell defers to rand(3), but it does some additional naïve filtering on the output. When the shell starts up, it generates 10 values from rand(). If any of them are larger than 32,767 then it will shift right by three all generated numbers.

#define RANDMASK 0x7fff

    for (n = 0; n < 10; n++) {
        // Don't use lower bits when rand() generates large numbers.
        if (rand() > RANDMASK) {
            rand_shift = 3;
            break;
        }
    }

Why not just look at RAND_MAX? I guess they didn’t think of it.

Update: Quentin Barnes pointed out that RAND_MAX didn’t exist until POSIX standardization in 1988. The constant first appeared in Unix in 1990. This KornShell code either predates the standard or needed to work on systems that predate the standard.

Like Bash, repeated values are not allowed. I suspect one shell got this idea from the other.

    do {
        cur = (rand() >> rand_shift) & RANDMASK;
    } while (cur == last);

Who came up with this strange idea first?

OpenBSD’s Public Domain Korn Shell (pdksh)

I picked the OpenBSD variant of pdksh since it’s the only pdksh fork I ever touch in practice, and its $RANDOM is the most interesting of the pdksh forks — at least since 2014.

Like Zsh, pdksh simply defers to rand(3). However, OpenBSD’s rand(3) is infamously and proudly non-standard. By default it returns non-deterministic, cryptographic-quality results seeded from system entropy (via the misnamed arc4random(3)), à la /dev/urandom. Its $RANDOM inherits this behavior.

    setint(vp, (int64_t) (rand() & 0x7fff));

However, if a value is assigned to $RANDOM in order to seed it, it reverts to its old pre-2014 deterministic generation via srand_deterministic(3).

    srand_deterministic((unsigned int)intval(vp));

OpenBSD’s deterministic rand(3) is the crummy LCG from the C89 standard, just like Bash 3.x. So if you assign to $RANDOM, you’ll get nearly the same results as Bash 3.x and earlier — the only difference being that it can repeat numbers.

That’s a slick upgrade to the old interface without breaking anything, making it my favorite version $RANDOM for any shell.

Why Aren't There C Conferences?

This article was discussed on Hacker News.

Most widely-used programming languages have at least one regular conference dedicated to discussing it. Heck, even Lisp has one. It’s a place to talk about the latest developments of the language, recent and upcoming standards, and so on. However, C is a notable exception. Despite its role as the foundation of the entire software ecosystem, there aren’t any regular conferences about C. I have a couple of theories about why.

First, C is so fundamental and ubiquitous that a conference about C would be too general. There are so many different uses ranging across embedded development, operating system kernels, systems programming, application development, and, most recently, web development (WebAssembly). It’s just not a cohesive enough topic. Any conference that might be about C is instead focused on some particular subset of its application. It’s not a C conference, it’s a database conference, or an embedded conference, or a Linux conference, or a BSD conference, etc.

Second, C has a tendency to be conservative, changing and growing very slowly. This is a feature, and one that is often undervalued by developers. (In fact, I’d personally like to see a future revision that makes the C language specification smaller and simpler, rather than accumulate more features.) The last major revision to C happened in 1999 (C99). There was a minor revision in 2011 (C11), and an even smaller revision in 2018 (C17). If there was a C conference, recent changes to the language wouldn’t be a very fruitful topic.

However, the tooling has advanced significantly in recent years, especially with the advent of LLVM and Clang. This is largely driven by the C++ community, and C has significantly benefited as a side effect due to its overlap. Those are topics worthy of conferences, but these are really C++ conferences.

The closest thing we have to a C conference every year is CppCon. A lot of CppCon isn’t really just about C++, and the subjects of many of the talks are easily applied to C, since C++ builds so much upon C. In a sense, a subset of CppCon could be considered a C conference. That’s what I’m looking for when I watch the CppCon presentations each year on YouTube.

Starting last year, I began a list of all the talks that I thought would be useful to C programmers. Some are entirely relevant to C, others just have significant portions that are relevant to C. When someone asks about where they can find a C conference, I send them my list.

I’m sharing them here so you can bookmark this page and never return again.

2017

Here’s the list for CppCon 2017. These are roughly ordered from highest to lowest recommendation:

2018

The final CppCon 2018 videos were uploaded this week, so my 2018 listing can be complete:

There were three talks strictly about C++ that I thought were interesting from a language design perspective. So I think they’re worth recommending, too. (In fact, they’re a sort of ammo against using C++ due to its insane complexity.)

Bonus

Finally, here are a few more good presentations from other C++ conferences which you can just pretend are about C:

A JIT Compiler Skirmish with SELinux

This is a debugging war story.

Once upon a time I wrote a fancy data conversion utility. The input was a complex binary format defined by a data dictionary supplied at run time by the user alongside the input data. Since the converter was typically used to process massive quantities of input, and the nature of that input wasn’t known until run time, I wrote an x86-64 JIT compiler to speed it up. The converter generated a fast, native binary parser in memory according to the data dictionary specification. Processing data now took much less time and everyone rejoiced.

Then along came SELinux, Sheriff of Pedantry. Not liking all the shenanigans with page protections, SELinux huffed and puffed and made mprotect(2) return EACCES (“Permission denied”). Believing I was following all the rules and so this would never happen, I foolishly did not check the result and the converter was now crashing for its users. What made SELinux so unhappy, and could this somehow be resolved?

Allocating memory

Before going further, let’s back up and review how this works. Suppose I want to generate code at run time and execute it. In the old days this was as simple as writing some machine code into a buffer and jumping to that buffer — e.g. by converting the buffer to a function pointer and calling it.

typedef int (*jit_func)(void);

/* NOTE: This doesn't work anymore! */
jit_func
jit_compile(int retval)
{
    unsigned char *buf = malloc(6);
    if (buf) {
        /* mov eax, retval */
        buf[0] = 0xb8;
        buf[1] = retval >>  0;
        buf[2] = retval >>  8;
        buf[3] = retval >> 16;
        buf[4] = retval >> 24;
        /* ret */
        buf[5] = 0xc3;
    }
    return (jit_func)buf;
}

int
main(void)
{
    jit_func f = jit_compile(1001);
    printf("f() = %d\n", f());
    free(f);
}

This situation was far too easy for malicious actors to abuse. An attacker could supply instructions of their own choosing — i.e. shell code — as input and exploit a buffer overflow vulnerability to execute the input buffer. These exploits were trivial to craft.

Modern systems have hardware checks to prevent this from happening. Memory containing instructions must have their execute protection bit set before those instructions can be executed. This is useful both for making attackers work harder and for catching bugs in programs — no more executing data by accident.

This is further complicated by the fact that memory protections have page granularity. You can’t adjust the protections for a 6-byte buffer. You do it for the entire surrounding page — typically 4kB, but sometimes as large as 2MB. This requires replacing that malloc(3) with a more careful allocation strategy. There are a few ways to go about this.

Anonymous memory mapping

The most common and most sensible is to create an anonymous memory mapping: a file memory map that’s not actually backed by a file. The mmap(2) function has a flag specifically for this purpose: MAP_ANONYMOUS.

#include <sys/mman.h>

void *
anon_alloc(size_t len)
{
    int prot = PROT_READ | PROT_WRITE;
    int flags = MAP_ANONYMOUS | MAP_PRIVATE;
    void *p = mmap(0, len, prot, flags, -1, 0);
    return p != MAP_FAILED ? p : 0;
}

void
anon_free(void *p, size_t len)
{
    munmap(p, len);
}

Unfortunately, MAP_ANONYMOUS not part of POSIX. If you’re being super strict with your includes — as I tend to be — this flag won’t be defined, even on systems where it’s supported.

#define _POSIX_C_SOURCE 200112L
#include <sys/mman.h>
// MAP_ANONYMOUS undefined!

To get the flag, you must use the _BSD_SOURCE, or, more recently, the _DEFAULT_SOURCE feature test macro to explicitly enable that feature.

#define _POSIX_C_SOURCE 200112L
#define _DEFAULT_SOURCE /* for MAP_ANONYMOUS */
#include <sys/mman.h>

The POSIX way to do this is to instead map /dev/zero. So, wanting to be Mr. Portable, this is what I did in my tool. Take careful note of this.

#define _POSIX_C_SOURCE 200112L
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

void *
anon_alloc(size_t len)
{
    int fd = open("/dev/zero", O_RDWR);
    if (fd == -1)
        return 0;
    int prot = PROT_READ | PROT_WRITE;
    int flags = MAP_PRIVATE;
    void *p = mmap(0, len, prot, flags, fd, 0);
    close(fd);
    return p != MAP_FAILED ? p : 0;
}

Aligned allocation

Another, less common (and less portable) strategy is to lean on the existing C memory allocator, being careful to allocate on page boundaries so that the page protections don’t affect other allocations. The classic allocation functions, like malloc(3), don’t allow for this kind of control. However, there are a couple of aligned allocation alternatives.

The first is posix_memalign(3):

int posix_memalign(void **ptr, size_t alignment, size_t size);

By choosing page alignment and a size that’s a multiple of the page size, it’s guaranteed to return whole pages. When done, pages are freed with free(3). Though, unlike unmapping, the original page protections must first be restored since those pages may be reused.

#define _POSIX_C_SOURCE 200112L
#include <stdlib.h>
#include <unistd.h>

void *
anon_alloc(size_t len)
{
    void *p;
    long pagesize = sysconf(_SC_PAGE_SIZE); // TODO: cache this
    size_t roundup = (len + pagesize - 1) / pagesize * pagesize;
    return posix_memalign(&p, pagesize, roundup) ? 0 : p;
}

If you’re using C11, there’s also aligned_alloc(3). This is the most uncommon of all since most C programmers refuse to switch to a new standard until it’s at least old enough to drive a car.

Changing page protections

So we’ve allocated our memory, but it’s not going to start in an executable state. Why? Because a W^X (“write xor execute”) policy is becoming increasingly common. Attempting to set both write and execute protections at the same time may be denied. (In fact, there’s an SELinux policy for this.)

As a JIT compiler, we need to write to a page and execute it. Again, there are two strategies. The complicated strategy is to map the same memory at two different places, one with the execute protection, one with the write protection. This allows the page to be modified as it’s being executed without violating W^X.

The simpler and more secure strategy is to write the machine instructions, then swap the page over to executable using mprotect(2) once it’s ready. This is what I was doing in my tool.

unsigned char *buf = anon_alloc(len);
/* ... write instructions into the buffer ... */
mprotect(buf, len, PROT_EXEC);
jit_func func = (jit_func)buf;
func();

At a high level, That’s pretty close to what I was actually doing. That includes neglecting to check the result of mprotect(2). This worked fine and dandy for several years, when suddenly (shown here in the style of strace):

mprotect(ptr, len, PROT_EXEC) = -1 EACCES (Permission denied)

Then the program would crash trying to execute the buffer. Suddenly it wasn’t allowed to make this buffer executable. My program hadn’t changed. What had changed was the SELinux security policy on this particular system.

Asking for help

The problem is that I don’t administer this (Red Hat) system. I can’t access the logs and I didn’t set the policy. I don’t have any insight on why this call was suddenly being denied. To make this more challenging, the folks that manage this system didn’t have the necessary knowledge to help with this either.

So to figure this out, I need to treat it like a black box and probe at system calls until I can figure out just what SELinux policy I’m up against. I only have practical experience administrating Debian systems (and its derivatives like Ubuntu), which means I’ve hardly ever had to deal with SELinux. I’m flying fairly blind here.

Since my real application is large and complicated, I code up a minimal example, around a dozen lines of code: allocate a single page of memory, write a single return (ret) instruction into it, set it as executable, and call it. The program checks for errors, and I can run it under strace if that’s not insightful enough. This program is also something simple I could provide to the system administrators, since they were willing to turn some of the knobs to help narrow down the problem.

However, here’s where I made a major mistake. Assuming the problem was solely in mprotect(2), and wanting to keep this as absolutely simple as possible, I used posix_memalign(3) to allocate that page. I saw the same EACCES as before, and assumed I was demonstrating the same problem. Take note of this, too.

Finding a resolution

Eventually I’d need to figure out what policy was blocking my JIT compiler, then see if there was an alternative route. The system loader still worked after all, and I could plainly see that with strace. So it wasn’t a blanket policy that completely blocked the execute protection. Perhaps the loader was given an exception?

However, the very first order of business was to actually check the result from mprotect(2) and do something more graceful rather than crash. In my case, that meant falling back to executing a byte-code virtual machine. I added the check, and now the program ran slower instead of crashing.

The program runs on both Linux and Windows, and the allocation and page protection management is abstracted. On Windows it uses VirtualAlloc() and VirtualProtect() instead of mmap(2) and mprotect(2). Neither implementation checked that the protection change succeeded, so I fixed the Windows implementation while I was at it.

Thanks to Mingw-w64, I actually do most of my Windows development on Linux. And, thanks to Wine, I mean everything, including running and debugging. Calling VirtualProtect() in Wine would ultimately call mprotect(2) in the background, which I expected would be denied. So running the Windows version with Wine under this SELinux policy would be the perfect test. Right?

Except that mprotect(2) succeeded under Wine! The Windows version of my JIT compiler was working just fine on Linux. Huh?

This system doesn’t have Wine installed. I had built and packaged it myself. This Wine build definitely has no SELinux exceptions. Not only did the Wine loader work correctly, it can change page protections in ways my own Linux programs could not. What’s different?

Debugging this with all these layers is starting to look silly, but this is exactly why doing Windows development on Linux is so useful. I run my program under Wine under strace:

$ strace wine ./mytool.exe

I study the system calls around mprotect(2). Perhaps there’s some stricter alignment issue? No. Perhaps I need to include PROT_READ? No. The only difference I can find is they’re using the MAP_ANONYMOUS flag. So, armed with this knowledge, I modify my minimal example to allocate 1024 pages instead of just one, and suddenly it works correctly. I was most of the way to figuring this all out.

Inside glibc allocation

Why did increasing the allocation size change anything? This is a typical Linux system, so my program is linked against the GNU C library, glibc. This library allocates memory from two places depending on the allocation size.

For small allocations, glibc uses brk(2) to extend the executable image — i.e. to extend the .bss section. These resources are not returned to the operating system after they’re freed with free(3). They’re reused.

For large allocations, glibc uses mmap(2) to create a new, anonymous mapping for that allocation. When freed with free(3), that memory is unmapped and its resources are returned to the operating system.

By increasing the allocation size, it became a “large” allocation and was backed by an anonymous mapping. Even though I didn’t use mmap(2), to the operating system this would be indistinguishable to what Wine was doing (and succeeding at).

Consider this little example program:

int
main(void)
{
    printf("%p\n", malloc(1));
    printf("%p\n", malloc(1024 * 1024));
}

When not compiled as a Position Independent Executable (PIE), here’s what the output looks like. The first pointer is near where the program was loaded, low in memory. The second pointer is a randomly selected address high in memory.

0x1077010
0x7fa9b998e010

And if you run it under strace, you’ll see that the first allocation comes from brk(2) and the second comes from mmap(2).

Two SELinux policies

With a little bit of research, I found the two SELinux policies at play here. In my minimal example, I was blocked by allow_execheap.

/selinux/booleans/allow_execheap

This prohibits programs from setting the execute protection on any “heap” page.

The POSIX specification does not permit it, but the Linux implementation of mprotect allows changing the access protection of memory on the heap (e.g., allocated using malloc). This error indicates that heap memory was supposed to be made executable. Doing this is really a bad idea. If anonymous, executable memory is needed it should be allocated using mmap which is the only portable mechanism.

Obviously this is pretty loose since I was still able to do it with posix_memalign(3), which, technically speaking, allocates from the heap. So this policy applies to pages mapped by brk(2).

The second policy was allow_execmod.

/selinux/booleans/allow_execmod

The program mapped from a file with mmap and the MAP_PRIVATE flag and write permission. Then the memory region has been written to, resulting in copy-on-write (COW) of the affected page(s). This memory region is then made executable […]. The mprotect call will fail with EACCES in this case.

I don’t understand what purpose this policy serves, but this is what was causing my original problem. Pages mapped to /dev/zero are not actually considered anonymous by Linux, at least as far as this policy is concerned. I think this is a mistake, and that mapping the special /dev/zero device should result in effectively anonymous pages.

From this I learned a little lesson about baking assumptions — that mprotect(2) was solely at fault — into my minimal debugging examples. And the fix was ultimately easy: I just had to suck it up and use the slightly less pure MAP_ANONYMOUS flag.

The Missing Computer Skills of High School Students

This article was discussed on Hacker News and discussed on Reddit.

It’s been just over fours years since I started mentoring high school students at work, and I recently began mentoring my fourth such student. That’s enough students for me to start observing patterns. Of course, each arrives with different computer knowledge and experience, but there have been two consistent and alarming gaps. One is a concept and the other is a skill, both of which I expect an advanced high schooler, especially one interested in computers, to have before they arrive. This gap persists despite students taking computer classes at school.

File, Directories, and Paths

The vital gap in concepts is files, directories, or, broadly speaking, paths. Students do initially arrive with a basic notion of files and directories (i.e. “folders”) and maybe some rough idea that there’s a hierarchy to it all. But they’ve never learned the notation: a location to a file specified by a sequence of directory components which may be either relative or absolute. More specifically, they’ve never been exposed to the concepts of . (current directory) nor .. (parent directory).

The first thing I do with my students is walk them though a Linux installation process and then sit them in front of a terminal. Since most non-option arguments are file paths, shell commands are rather limited if you don’t know anything about paths. You can’t navigate between directories or talk about files outside of your home directory. So one of the first things I have to teach is how paths work. We go through exercises constructing and reasoning about paths, and it takes some time and practice for students to really “get” them.

And this takes longer than you might think! Even once the student has learned the basic concepts, it still takes practice to internalize and reason about them. It’s been a consistent enough issue that I really should assemble a booklet to cover it, and perhaps some sort of interactive practice. I could just hand this to the student so they can learn on their own as they do with other materials.

Paths aren’t just imporant for command line use. They come up in every day programming when programs need to access files. In some contexts it even has security implications regardless of the programming language. For example, care must be taken handling and validating paths supplied from an untrusted source. A web application may need to translate a path-like string in a query into a file path, and not understanding how .. works can make this dangerous. Or not understanding how paths need to be normalized before being compared.

I consider paths as falling under file and directory basics, and it’s part of the baseline for a person to be considered computer literate.

Touch Typing

The other major gap is touch typing. None of my students have been touch typists, and it slows them all down far more than they realize. I spend a lot of time next to them at the keyboard as they drive, so I’ve seen the costs myself. In a couple of cases, the students have to finger peck while looking at the keyboard.

An important step in mastering the use of computers is quickly iterating on new ideas and concepts — trying out and playing with things as they are learned. Being a slow typist not only retards this process, the tedium of poor, plodding keyboarding actively discourages experimentation, becoming a barrier. Advanced computer use isn’t much fun if you can’t type well.

To be fair, I’ve only been a proper touch typist for under two years. I wish I had learned it much earlier in life, and I really only have myself to blame that it took so long. Fortunately I had developed my own pseudo touch touching style that required neither finger pecking nor looking at the keyboard. My main issue was accuracy, not that typing was tedious or slow.

The bad news is that, unlike paths, this is completely outside my control. First, one of the major guidelines of the mentoring program is that we’re not supposed to spend a lot of time on basic skills. Learning to touch type takes several weeks of daily effort. That’s just too much time that we don’t have. Second, this won’t work anyway unless the student is motivated to learn it. I have no idea how to provide that sort of motivation. (And if the student is motivated, they’ll do it on their own time anyway.) I think that’s where schools get stuck.

The really bad news is that this problem is going to get worse. The mobile revolution happened, and, for most people, mobile devices are gradually replacing the home computer, even laptops. I already know one student who doesn’t have access to a general purpose computer at home. The big difference between a tablet and a laptop is that a tablet is purely for consumption.

In the future, kids will be less and less exposed to keyboards, and productive computing in general. Keyboards are, and will continue to be for the foreseeable future, a vital tool for professionals. I wonder if the future will be a bit like, say, the 1980s again, where only a small fraction of kids will be exposed to a proper computer. Only instead of a PC clone, Commodore, or Apple II, it’s a Raspberry Pi.

Conclusions

I want to make something really clear: I’m not blaming the students for these gaps. It’s not in any way their fault. What they’re taught and exposed to is, at this point in life, largely outside of their control.

I lay most of the blame on the schools. My mentees have all taken high school programming classes of some sort, but these classes somehow manage to skip over the fundamentals. Instead it’s rote learning some particular IDE without real understanding. Finally I can relate to all those mathematicians’ complaining about how math class is taught!

What can be done? If you’re a parent, make sure your kid has access to a general purpose computer, even if it’s only a Raspberry Pi or one of its clones, along with a keyboard and mouse. (Of course, if you’re reading this article you’re not one of the parents that needs this advice.) It’s good exposure all around.

After reflecting on this recently, I think one of the shortcomings of my mentoring is that I don’t spend enough time — generally none at all — at the keyboard driving with my mentee as the passenger, where they can really observe me in action. Usually it’s me approaching them to check on their progress, and the opportunity just isn’t there. Perhaps it would be motivating to see how efficient and fun computing can be at higher levels of proficiency — to show them how touch typing and a powerful text editor can lead to such a dramatic difference in experience. It would be the answer to that question of, “Why should I learn this?”

From Vimperator to Tridactyl

Earlier this month I experienced a life-changing event — or so I thought it would be. It was fully anticipated, and I had been dreading the day for almost a year, wondering what I was going to do. Could I overcome these dire straits? Would I ever truly accept the loss, or will I become a cranky old man who won’t stop talking about how great it all used to be?

So what was this big event? On September 5th, Mozilla officially and fully ended support for XUL extensions (XML User Interface Language), a.k.a. “legacy” extensions. The last Firefox release to support these extensions was Firefox 52 ESR, the browser I had been using for some time. A couple days later, Firefox 60 ESR entered Debian Stretch to replace it.

The XUL extension API was never well designed. It was clunky, quirky, and the development process for extensions was painful, requiring frequent restarts. It was bad enough that I was never interested in writing my own extensions. Poorly-written extensions unfairly gave Firefox a bad name, causing memory leaks and other issues, and Firefox couldn’t tame the misbehavior.

Yet this extension API was incredibly powerful, allowing for rather extreme UI transformations that really did turn Firefox into a whole new browser. For the past 15 years I wasn’t using Firefox so much as a highly customized browser based on Firefox. It’s how Firefox has really stood apart from everyone else, including Chrome.

The wide open XUL extension API was getting in the way of Firefox moving forward. Continuing to support it required sacrifices that Mozilla was less and less willing to make. To replace it, they introduced the WebExtensions API, modeled very closely after Chrome’s extension API. These extensions are sandboxed, much less trusted, and the ecosystem more closely resembles the “app store” model (Ugh!). This is great for taming poorly-behaved extensions, but they are far less powerful and capable.

The powerful, transformative extension I’d been using the past decade was Vimperator — and occasionally with temporary stints in its fork, Pentadactyl. It overhauled most of Firefox’s interface, turning it into a Vim-like modal interface. In normal mode I had single keys bound to all sorts of useful functionality.

The problem is that Vimperator is an XUL extension, and it’s not possible to fully implement using the WebExtensions API. It needs capabilities that WebExtensions will likely never provide. Losing XUL extensions would mean being thrown back 10 years in terms my UI experience. The possibility of having to use the web without it sounded unpleasant.

Fortunately there was a savior on the horizon already waiting for me: Tridactyl! It is essentially a from-scratch rewrite of Vimperator using the WebExtensions API. To my complete surprise, these folks have managed to recreate around 85% of what I had within the WebExtensions limitations. It will never be 100%, but it’s close enough to keep me happy.

What matters to me

There are some key things Vimperator gave me that I was afraid of losing.

I keep all my personal configuration dotfiles under source control. It’s a shame that Firefox, despite being so flexible, has never supported this approach to configuration. Fortunately Vimperator filled this gap with its .vimperatorrc file, which could not only be used to configure the extension but also access nearly everything on the about:config page. It’s the killer feature Firefox never had.

Since WebExtensions are sandboxed, they cannot (normally) access files. Fortunately there’s a work around: native messaging. It’s a tiny, unsung backdoor that closes the loop on some vital features. Tridactyl makes it super easy to set up (:installnative), and doing so enables the .tridactylrc file to be loaded on startup. Due to WebExtensions limitations it’s not nearly as powerful as the old .vimperatorrc but it covers most of my needs.

In Vimperator, when a text input is focused I could press CTRL+i to pop up my $EDITOR (Vim, Emacs, etc.) to manipulate the input much more comfortably. This is so, so nice when writing long form content on the web. The alternative is to copy-paste back and forth, which is tedious and error prone.

Since WebExtensions are sandboxed, they cannot (normally) start processes. Again, native messaging comes to the rescue and allows Tridactyl to reproduce this feature perfectly.

In Vimperator I could press f or F to enter a special mode that allowed me to simulate a click to a page element, usually a hyperlink. This could be used to navigate without touching the mouse. It’s really nice for “productive” browsing, where my fingers are already on home row due to typing (programming or writing), and I need to switch to a browser to look something up. I rarely touch the mouse when I’m in productive mode.

This actually mostly works fine under WebExtensions, too. However, due to sandboxing, WebExtensions aren’t active on any of Firefox’s “meta” pages (configuration, errors, etc.), or Mozilla’s domains. This means no mouseless navigation on these pages.

The good news is that Tridactyl has better mouseless browsing than Vimperator. Its “tag” overlay is alphabetic rather than numeric, so it’s easier to type. When it’s available, the experience is better.

In normal mode, which is the usual state Vimperator/Tridactyl is in, I’ve got useful functionality bound to single keys. There’s little straining for the CTRL key. I use d to close a tab, u to undo it. In my own configuration I use w and e to change tabs, and x and c to move through the history. I can navigate to any “quickmark” in three keystrokes. It’s all very fast and fluid.

Since WebExtensions are sandboxed, extensions have limited ability to capture these keystrokes. If the wrong browser UI element is focused, they don’t work. If the current page is one of those extension-restricted pages, these keys don’t work.

The worse problem of all, by far, is that WebExtensions are not active until the current page has loaded. This is the most glaring flaw in WebExtensions, and I’m surprised it still hasn’t been addressed. It negatively affects every single extension I use. What this means for Tridactyl is that for a second or so after navigating a link, I can’t interact with the extension, and the inputs are completely lost. This is incredibly frustrating. I have to wait on slow, remote servers to respond before regaining control of my own browser, and I often forget about this issue, which results in a bunch of eaten keystrokes. (Update: Months have passed and I’ve never gotten used to this issue. It irritates me a hundred times every day. This is by far Firefox’s worst design flaw.)

Other extensions

I’m continuing to use uBlock Origin. Nothing changes. As I’ve said before, an ad-blocker is by far the most important security tool on your computer. If you practice good computer hygiene, malicious third-party ads/scripts are the biggest threat vector for your system. A website telling you to turn off your ad-blocker should be regarded as suspiciously as being told to turn off your virus scanner (for all you Windows users who are still using one).

The opposite of mouseless browsing is keyboardless browsing. When I’m not being productive, I’m often not touching the keyboard, and navigating with just the mouse is most comfortable. However, clicking little buttons is not. So instead of clicking the backward and forward buttons, I prefer to swipe the mouse, e.g. make a gesture.

I previously used FireGestures, an XUL extension. I’m now using Gesturefy. (Update: Gesturefy doesn’t support ESR either.) I also considered Foxy Gestures, but it doesn’t currently support ESR releases. Unfortunately all mouse gesture WebExtensions suffer from the page load problem: any gesture given before the page loads is lost. It’s less of any annoyance than with Tridactyl, but it still trips me up. They also don’t work on extension-restricted pages.

Firefox 60 ESR is the first time I’m using a browser supported by uMatrix — another blessing from the author of uBlock Origin (Raymond Hill) — so I’ve been trying it out. Effective use requires some in-depth knowledge of how the web works, such as the same-origin policy, etc. It’s not something I’d recommend for most people.

GreaseMonkey was converted to the WebExtensions API awhile back. As a result it’s a bit less capable than it used to be, and I had to adjust a couple of my own scripts before they’d work again. I use it as a “light extension” system.

XUL alternatives

Many people have suggested using one of the several Firefox forks that’s maintaining XUL compatibility. I haven’t taken this seriously for a couple of reasons:

Even the Debian community gave up on that idea long ago, and they’ve made a special exception that allows recent versions of Firefox and Chrome into the stable release. Web browsers are huge and complex because web standards are huge and complex (a situation that concerns me in the long term). The vulnerabilities that pop up regularly are frightening.

In Back to the Future Part II, Biff Tannen was thinking too small. Instead of a sports almanac, he should have brought a copy of the CVE database.

This is why I also can’t just keep using an old version of Firefox. If I was unhappy with, say, the direction of Emacs 26, I could keep using Emacs 25 essentially forever, frozen in time. However, Firefox is internet software. Internet software decays and must be maintained.

Most importantly, the Vimperator extension is no longer maintained. There’s no reason to stick around this ghost town.

Special Tridactyl customizations

The syntax for .tridactylrc is a bit different than .vimperatorrc, so I couldn’t just reuse my old configuration file. Key bindings are simple enough to translate, and quickmarks are configured almost the same way. However, it took me some time to figure out the rest.

With Vimperator I’d been using Firefox’s obscure “bookmark keywords” feature, where a bookmark is associated with a single word. In Vimperator I’d use this as a prefix when opening a new tab to change the context of the location I was requesting.

For example, to visit the Firefox subreddit I’d press o to start opening a new tab, then r firefox. I had r registered via .vimperatorrc as the bookmark keyword for the URL template https://old.reddit.com/r/%s.

WebExtensions doesn’t expose bookmark keywords, and keywords are likely to be removed in a future Firefox release. So instead someone showed me this trick:

set searchurls.r   https://old.reddit.com/r/%s
set searchurls.w   https://en.wikipedia.org/w/index.php?search=%s
set searchurls.wd  https://en.wiktionary.org/wiki/?search=%s

These lines in .tridactylrc recreates the old functionality. Works like a charm!

Another initial annoyance is that WebExtensions only exposes the X clipboard (XA_CLIPBOARD), not the X selection (XA_PRIMARY). However, I nearly always use the X selection for copy-paste, so it was like I didn’t have any clipboard access. (Honestly, I’d prefer XA_CLIPBOARD didn’t exist at all.) Again, native messaging routes around the problem nicely, and it’s trivial to configure:

set yankto both
set putfrom selection

There’s an experimental feature, guiset to remove most of Firefox’s UI elements, so that it even looks nearly like the old Vimperator. As of this writing, this feature works poorly, so I’m not using it. It’s really not important to me anyway.

Today’s status

So I’m back to about 85% of the functionality I had before the calamity, which is far better than I had imagined. Other than the frequent minor annoyances, I’m pretty satisfied.

In exchange I get better mouseless browsing and much better performance. I’m not kidding, the difference Firefox Quantum makes is night and day. In my own case, Firefox 60 ESR is using one third of the memory of Firefox 52 ESR (Update: after more experience with it, I realize its just as much of a memory hog as before), and I’m not experiencing the gradual memory leak. This really makes a difference on my laptop with 4GB of RAM.

So was it worth giving up that 15% capability for these improvements? Perhaps it was. Now that I’ve finally made the leap, I’m feeling a lot better about the whole situation.

Brute Force Incognito Browsing

Both Firefox and Chrome have a feature for creating temporary private browsing sessions. Firefox calls it Private Browsing and Chrome calls it Incognito Mode. Both work essentially the same way. A temporary browsing session is started without carrying over most existing session state (cookies, etc.), and no state (cookies, browsing history, cached data, etc.) is preserved after ending the session. Depending on the configuration, some browser extensions will be enabled in the private session, and their own internal state may be preserved.

The most obvious use is for visiting websites that you don’t want listed in your browsing history. Another use for more savvy users is to visit websites with a fresh, empty cookie file. For example, some news websites use a cookie to track the number visits and require a subscription after a certain number of “free” articles. Manually deleting cookies is a pain (especially without a specialized extension), but opening the same article in a private session is two clicks away.

For web development there’s yet another use. A private session is a way to view your website from the perspective of a first-time visitor. You’ll be logged out and will have little or no existing state.

However, sometimes it just doesn’t go far enough. Some of those news websites have adapted, and in addition to counting the number of visits, they’ve figured out how to detect private sessions and block them. I haven’t looked into how they do this — maybe something to do with local storage, or detecting previously cached content. Sometimes I want a private session that’s truly fully isolated. The existing private session features just aren’t isolated enough or they behave differently, which is how they’re being detected.

Some time ago I put together a couple of scripts to brute force my own private sessions when I need them, generally for testing websites in a guaranteed fresh, fully-functioning instance. It also lets me run multiple such sessions in parallel. My scripts don’t rely on any private session feature of the browser, so the behavior is identical to a real browser, making it undetectable.

The downside is that, for better or worse, no browser extensions are carried over. In some ways this can be considered a feature, but a lot of the time I would like my ad-blocker to carry over. Your ad-blocker is probably the most important security software on your computer, so you should hesitate to give it up.

Another downside is that both Firefox and Chrome have some irritating first-time behaviors that can’t be disabled. The intent is to be newbie-friendly but it just gets in my way. For example, both bug me about logging into their browser platforms. Firefox starts with two tabs. Chrome creates a popup to ask me to configure a printer. Both start with a junk URL in the location bar so I can’t just middle-click paste (i.e. the X11 selection clipboard) into it. It’s definitely not designed for my use case.

Firefox

Here’s my brute force private session script for Firefox:

#!/bin/sh -e
DIR="${XDG_CACHE_HOME:-$HOME/.cache}"
mkdir -p -- "$DIR"
TEMP="$(mktemp -d -- "$DIR/firefox-XXXXXX")"
trap "rm -rf -- '$TEMP'" INT TERM EXIT
firefox -profile "$TEMP" -no-remote "$@"

It creates a temporary directory under $XDG_CACHE_HOME and tells Firefox to use the profile in that directory. No such profile exists, of course, so Firefox creates a fresh profile.

In theory I could just create a new profile alongside the default within my existing ~/.mozilla directory. However, I’ve never liked Firefox’s profile feature, especially with the intentionally unpredictable way it stores the profile itself: behind random path. I also don’t trust it to be fully isolated and to fully clean up when I’m done.

Before starting Firefox, I register a trap with the shell to clean up the profile directory regardless of what happens. It doesn’t matter if Firefox exits cleanly, if it crashes, or if I CTRL-C it to death.

The -no-remote option prevents the new Firefox instance from joining onto an existing Firefox instance, which it really prefers to do even though it’s technically supposed to be a different profile.

Note the "$@", which passes arguments through to Firefox — most often the URL of the site I want to test.

Chromium

I don’t actually use Chrome but rather the open source version, Chromium. I think this script will also work with Chrome.

#!/bin/sh -e
DIR="${XDG_CACHE_HOME:-$HOME/.cache}"
mkdir -p -- "$DIR"
TEMP="$(mktemp -d -- "$DIR/chromium-XXXXXX")"
trap "rm -rf -- '$TEMP'" INT TERM EXIT
chromium --user-data-dir="$TEMP" \
         --no-default-browser-check \
         --no-first-run \
         "$@" >/dev/null 2>&1

It’s exactly the same as the Firefox script and only the browser arguments have changed. I tell it not to ask about being the default browser, and --no-first-run disables some of the irritating first-time behaviors.

Chromium is very noisy on the command line, so I also redirect all output to /dev/null.

If you’re on Debian like me, its version of Chromium comes with a --temp-profile option that handles the throwaway profile automatically. So the script can be simplified:

#!/bin/sh -e
chromium --temp-profile \
         --no-default-browser-check \
         --no-first-run \
         "$@" >/dev/null 2>&1

In my own use case, these scripts have fully replaced the built-in private session features. In fact, since Chromium is not my primary browser, my brute force private session script is how I usually launch it. I only run it to test things, and I always want to test using a fresh profile.

null program

Chris Wellons