QOI is now my favorite asset format

December 18, 2022

nullprogram.com/blog/2022/12/18/

This article was discussed on Hacker News.

The Quite OK Image (QOI) format was announced late last year and finalized into a specification a month later. Initially dismissive, a revisit has shifted my opinion to impressed. The format hits a sweet spot in the trade-off space between complexity, speed, and compression ratio. Also considering its alpha channel support, QOI has become my default choice for embedded image assets. It’s not perfect, but at the very least it’s a solid foundation.

Since I’m now working with QOI images, I need a good QOI viewer, and so I added support to my ill-named pbmview tool, which I wrote to serve the same purpose for Netpbm. I will continue to use Netpbm as an output format, especially for raw video output, but no longer will I use it for an embedded asset (nor re-invent yet another RLE over Netpbm).

I was dismissive because the website claimed, and still claims today, QOI images are “a similar size” to PNG. However, for the typical images where I would use PNG, QOI is around 3x larger, and some outliers are far worse. The 745 PNGs on my blog — a perfect test corpus for my own needs — convert to QOIs 2.8x larger on average. The official QOI benchmark has much better results, 1.3x larger, but that’s because it includes a lot of photography where PNG and QOI both do poorly, making QOI seem more comparable.

However, as I said, QOI’s strength is its trade-off sweet spot. The specification is one page, and an experienced developer can write a complete implementation from scratch in a single sitting. My own implementation is about 100 lines of libc-free C for each of the encoder and decoder. With error checking removed, my decoder is ~600 bytes of x86 object code — a great story for embedding alongside assets. It’s more complex than Netpbm or farbfeld, but it’s far simpler than BMP. I’ve already begun experimenting with converting assets to QOI, and the results have so far exceeded my expectations.

To my surprise, the encoder was easier to write than the decoder. The format is so straightforward such that two different encoders will produce the identical files. There’s little room for specialized optimization, and no meaningful “compression level” knob.

Criticism

There are a lot of dimensions on which QOI could be improved, but most cases involve trade-offs, e.g. more complexity for better compression. The areas where QOI could have been strictly better, the dimensions on which it is not on the Pareto frontier, are more meaningful criticisms — missed opportunities. My criticisms of this kind:

Big endian fields are an odd choice for a 2020s file format. Little endian dominates the industry, and it would have made for a slightly smaller decoder footprint on typical machines today if QOI used little endian.
The header has two flags and spends an entire byte on each. It should have instead had a flag byte, with two bits assigned to these flags. One flag indicates if the alpha channel is important, and the other selects between two color spaces (sRGB, linear). Both flags are only advisory.
The 4-channel encoded pixel format is ABGR (or RGBA), placing the alpha channel next to the blue channel. This is somewhat unconventional. A decoder is likely to use a single load into 32-bit integer, and ideally it’s already in the desired format or close to it. A few times already I’ve had to shuffle the RGB bytes within the 32-bit sample to be compatible with some other format. QOI channel ordering is arbitrary, and I would have chosen ARGB (when viewed as little endian).
The QOI hash function operates on channels individually, with individual overflow, making it slower and larger than necessary. The hash function should have been over a packed 32-bit sample. I would have used a multiplication by a carefully-chosen 32-bit integer, then a right shift using the highest 6 bits of the result for the index.

More subjective criticisms that might count as having trade-offs:

Given a “flag byte” (mentioned above) it would have been free to assign another flag bit indicating pre-multiplied alpha, also still advisory. You want to use pre-multiplied alpha for your assets, and the option store them this way would help.
There’s an 8-byte end-of-stream marker — a bit excessive — deliberately an invalid encoding so that reads past the end of the image will result in a decoding error. I probably would have chosen a dead simple 32-bit checksum of packed 32-bit images samples, even if literally a sum.

Of course, you’re not obligated to follow QOI exactly to spec for your own assets, so you could always use a modified QOI with one or more of these tweaks. That’s what I meant about it being a solid foundation: You don’t have to start from scratch with some custom RLE. Since the format is so simple, you can easily build your own tools — as I’ve already begun doing myself — so you don’t need to rely on tools supporting your QOI fork.

Minimalist API

I’m really happy with my QOI implementation, particularly since it’s another example of a minimalist C API: no allocating, no input or output, and no standard library use. As usual, the expectation is that it’s in the same translation unit where it’s used, so it’s likely inlined into callers.

The encoder is streaming — it accepts and returns only a little bit of input and output at a time. It has three functions and one struct with no “public” fields:

struct qoiencoder qoiencoder(void *buf, int w, int h, const char *flags);
int qoiencode(struct qoiencoder *, void *buf, unsigned color);
int qoifinish(struct qoiencoder *, void *buf);

The first function initializes an encoder and writes a fixed-length header into the QOI buffer. The flags field is a mode string, like fopen. I would normally use bit flags, but this is a little experiment. The second function encodes a single pixel into the QOI buffer, returning the number of bytes written (possibly zero). The last flushes any encoding state and writes the end-of-stream marker. There are no errors. My typical use so far looks like:

char buf[16];
struct qoiencoder q = qoiencoder(buf, width, height, "a");
fwrite(buf, QOIHDRLEN, 1, file);
for (int y = 0; y < height; y++) {
    for (int x = 0; x < width; x++) {
        // ... compute 32-bit ABGR sample at (x, y) ...
        fwrite(buf, qoiencode(&q, buf, abgr), 1, file);
    }
}
fwrite(buf, qoifinish(&q, buf), 1, file);
fflush(file);
return ferror(file);

This appends encoder outputs to a buffered stream, but it could just as well accumulate directly into a larger buffer, advancing the write pointer a little after each call.

The decoder is two functions, but its struct has some “public” fields.

struct qoidecoder {
    int width, height;
    _Bool alpha, srgb, error;
    // ...
};
struct qoidecoder qoidecoder(const void *buf, int len);
static unsigned qoidecode(struct qoidecoder *);

The input is not streamed and the entire buffer must be loaded into memory at once — not too bad since it’s compressed, and perhaps even already loaded as part of the executable image — but the output is streamed, delivering one packed 32-bit ABGR sample per call. The decoder makes no assumptions about the output format, and the caller unpacks samples and stores them in whatever format is appropriate (shader texture, etc.).

To make it easier to use, my decoder range checks to guarantee that width and height can be multiplied without overflow. Unlike encoding, there may be errors due to invalid input, including that failed range check. The decoder error flag is “sticky” and the decoder returns zero samples when in an error state, so callers can wait to check for errors until the end. (Though if you’re only decoding embedded assets, then there are no practical errors, and checks can be removed/ignored.)

Example usage, copied almost verbatim from a real program:

int loadimage(Image *image, const uint8_t *qoi, int len)
{
    struct qoidecoder q = qoidecoder(qoi, len);
    if (/* image dimensions too large */) {
        return 0;
    }
    image->width  = q.width;
    image->height = q.height;
    int count = q.width * q.height;
    for (int i = 0; i < count; i++) {
        unsigned abgr = qoidecode(&q);
        image->data[4*i+0] = abgr >> 16;
        image->data[4*i+1] = abgr >>  8;
        image->data[4*i+2] = abgr >>  0;
        image->data[4*i+3] = abgr >> 24;
    }
    return !q.error;
}

Note the aforementioned awkward RGB shuffle.

It’s safe to say that I’m excited about QOI, and that it now has a permanent slot on my developer toolbelt.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.