nullprogram.com/blog/2018/06/10/
In the past year I’ve written a number of minimalist C libraries,
particularly header libraries. The distinction for “minimalist” is, of
course, completely arbitrary and subjective. My definition in this
context isn’t about the library’s functionality being stupidly
trivial or even necessarily simple. I’m talking about
interface (API) complexity and the library’s run time requirements.
Complex functionality can, in some cases, be tucked behind a simple
interface.
In this article I’ll give my definition for minimalist C API, then take
you through some of my own recent examples.
Minimalist properties
A minimalist C library would generally have these properties.
(1) Small number of functions, perhaps even as little as one.
This one’s pretty obvious. More functions means more surface area in
the interface. Since these functions typically interact, the
relationship between complexity and number of functions will be
superlinear.
(2) No dynamic memory allocations.
The library mustn’t call malloc()
internally. It’s up to the caller
to allocate memory for the library. What’s nice about this is that
it’s completely up to the application exactly how memory is allocated.
Maybe it’s using a custom allocator, or it’s not linked against the
standard library.
A common approach is for the application to provide allocation functions
to the library — e.g. function pointers at run time, or define functions
with specific, expected names. The library would call these instead of
malloc()
and free()
. While that’s perfectly reasonable, it’s not
really minimalist, so I’m not including this technique.
Instead a minimalist API is designed such that it’s natural for the
application to make the allocations itself. Perhaps the library only
needs a single, fixed allocation for all its operations. Or maybe the
application specifies its requirements and the library communicates
how much memory is needed to meet those requirements. I’ll give
specific examples shortly.
One nice result of this property is that it eliminates one of the
common failure conditions: the out of memory error. If the library
doesn’t allocate memory, then it can’t run out of it!
Another convenient, minor outcome is the lack of casts from void *
to
the appropriate type (e.g. on the return from malloc()
). These casts
are implicit in C but must be made explicit in C++. Often, completely by
accident, my minimalist C libraries can be compiled as C++ without any
changes. This is only a minor benefit since these casts could be made
explicit in C, too, if C++ compatibility was desired. It’s just ugly.
In simple terms, the library mustn’t use functions from stdio.h
—
with the exception of the sprintf()
family. Like with memory
allocation, it leaves input and output to the application, letting it
decide exactly how, where, and when information comes and goes.
Like with memory allocation, maybe the application prefers not to use
the C standard library’s buffered IO. Perhaps the application is using
cooperative or green threads, and it would be bad for the
library to block internally on IO.
Also like avoiding memory allocation, a library that doesn’t perform IO
can’t have IO errors. Combined, this means it’s quite possible that a
minimalist library may have no error cases at all. Eliminating those
error handling paths makes the library a lot simpler. The one major
error condition left that’s difficult to eliminate are those pesky
integer overflow checks.
Communicating IO preferences to libraries can be a real problem with
C, since the standard library lacks generic input and output. Putting
FILE *
pointers directly into an API mingles it with the C standard
library in potentially bad ways. Passing file names as strings is an
option, but this limits IO to files — versus, say, sockets. On POSIX
systems, at least it could talk about IO in terms of file descriptors,
but even that’s not entirely flexible — e.g. output to a memory
buffer, or anything not sufficiently file-like.
Again, a common way to deal with this is for the application to
provide IO function pointers to the library. But a minimalist
library’s API would be designed such that not even this is needed,
instead operating strictly on buffers. I’ll also have a couple
examples of this shortly.
With IO and memory allocation out of the picture, another frequent,
accidental result is no dependency on the C standard library. The only
significant functionality left in the standard library are the
mathematical functions (math.h
), float parsing, and a few of
the string functions (string.h
), like memset()
and memmove()
.
These are valuable since they’re handled specially by the
compiler.
(4) Define at most one structure, and perhaps even none.
More types means more complexity, perhaps even more so than having lots
of functions. Some minimalist libraries can be so straightforward that
they can operate solely on simple, homogeneous buffers. I’ll show some
examples of this, too.
As I said initially, minimalism is about interface, not implementation.
The library is free to define as many structures internally as it needs
since the application won’t be concerned with them.
One common way to avoid complicated types in an API is to make them
opaque. The structures aren’t defined in the API, and instead the
application only touches pointers, making them like handles.
struct foo;
struct foo *foo_create(...);
int foo_method(struct foo *, ...);
void foo_destroy(struct foo *);
However, this is difficult to pull off when the library doesn’t allocate
its own memory.
Bitmap library
The first example is a library for creating bitmap (BMP) images. As you
may already know, I strongly prefer Netpbm, which is so simple
that it doesn’t even need a library. But nothing is quite so universally
supported as BMP.
24-bit BMP (Bitmap) ANSI C header library
This library is a perfect example of minimalist properties 2, 3, and 4.
It also doesn’t use any of the C standard library, though only by
accident.
It’s not a general purpose BMP library. It only supports 24-bit true
color, ignoring most BMP features such as palettes. Color is
represented as a 24-bit integer, packed 0xRRGGBB
.
unsigned long bmp_size(long width, long height);
void bmp_init(void *, long width, long height);
void bmp_set(void *, long x, long y, unsigned long color);
unsigned long bmp_get(const void *, long x, long y);
Strictly speaking, even the bmp_get()
function could be tossed since
the library is not intended to load external bitmap images. The
application really shouldn’t need to read back previously set pixels.
There is no allocation, no IO, and no data structures. The application
indicates the dimensions of image it wants to create, and the library
says how large of a buffer it needs. The remaining functions all operate
on this opaque buffer. To write the image out, the application only
needs to dump the buffer to a file.
Here’s a complete, strict error checking example of its usage:
#define RED 0xff0000UL
#define BLUE 0x0000ffUL
unsigned long size = bmp_size(width, height);
if (!size || size > SIZE_MAX) die("invalid dimensions");
void *bmp = calloc(size, 1);
if (!bmp) die("out of memory");
bmp_init(bmp, width, height);
/* Checkerboard pattern */
for (long y = 0; y < height; y++)
for (long x = 0; x < width; x++)
bmp_set(bmp, x, y, x % 2 == y % 2 ? RED : BLUE);
if (!fwrite(bmp, size, 1, out))
die("output error");
free(bmp);
The only library function that can fail is bmp_size()
. When the
given image dimensions would overflow one of the BMP header fields, it
returns zero to indicate as such.
In bmp_set()
, how does it know the dimensions of the image so that
it can find the pixel? It reads that from the buffer just like a BMP
reader would — and in a endian-agnostic manner. There are
no bounds checks — that’s the caller’s job — so it only needs to read
the image’s width in order to find the pixel’s location.
Since IO is under control of the application, it can always choose load
the original buffer contents back from a file, allowing a minimal sort
of BMP loading. However, this only works for trusted input as there
are no validation checks on the buffer.
32-bit integer hash set library
The second example is an integer hash set library. It uses closed
hashing. I initially wrote this for r/dailyprogrammer solution and
then formalized it into a little reusable library.
C99 32-bit integer hash set header library
Here’s the entire API:
int set32_z(uint32_t max);
void set32_insert(uint32_t *table, int z, uint32_t v);
void set32_remove(uint32_t *table, int z, uint32_t v);
int set32_contains(uint32_t *table, int z, uint32_t v);
Again, it’s a good example of properties 2, 3, and 4. Like the BMP
library, the application indicates the maximum number of integers it
will store in the hash set, and the library returns the power of two
number of uint32_t
it needs to allocate (and zero-initialize).
In this API I’m just barely skirting not defining a data structure.
The caller must pass both the table pointer and the power of two size,
and these two values would normally be bundled together into a
structure.
int z = set32_z(max);
unsigned long long n = 1ULL << z;
if (n > SIZE_MAX) die("table too large");
uint32_t *table = calloc(sizeof(*table), n);
if (!table) die("out of memory");
set32_insert(table, z, value);
if (set32_contains(table, z, value))
/* ... */;
set32_remove(table, z, value);
free(table);
Iteration is straightforward, which is why it’s not in the API: visit
each element in the allocated buffer. Zeroes are empty slots.
If a different maximum number of elements is needed, the application
initializes a new, separate table, then iterates over the old table
inserting each integer in turn.
Perhaps the most interesting part of the API is that it has no errors.
No function can fail.
Also, like the BMP library, it accidentally doesn’t use the standard
library, except for a typedef
from stdint.h
.
Fantasy name generator
Nearly a decade ago I cloned in Perl the RinkWorks Fantasy Name
Generator. This version was slow and terrible, and I’m sometimes
tempted to just delete it.
A few years later I rewrote it in JavaScript using an entirely
different approach. In order to improve performance, it has a template
compilation step. The compiled template is a hierarchical composition of
simple generator objects. It’s much faster, and easily enabled some
extensions to the syntax.
Germán Méndez Bravo ported the JavaScript version to C++. This
C++ implementation was recently adopted into IVAN, a
roguelike game.
This recent commotion made me realize something: I hadn’t yet
implemented it in C! So I did.
Fantasy name generator ANSI C header library
The entire API is just a single function with four possible return
values. It’s a perfect example of minimalist property 1.
#define NAMEGEN_SUCCESS 0
#define NAMEGEN_TRUNCATED 1 /* Output was truncated */
#define NAMEGEN_INVALID 2 /* Pattern is invalid */
#define NAMEGEN_TOO_DEEP 3 /* Exceeds maximum nesting depth */
int namegen(char *dest,
size_t len,
const char *pattern,
unsigned long *seed);
There’s no template compilation step, and it generates names straight
from the template.
There are three kinds of errors.
-
If the output buffer wasn’t large enough, it warns about the name
being truncated.
-
The template could be invalid — e.g. incorrectly paired brackets.
-
The template could have too much nesting. I decided to hard code the
maximum nesting depth to a generous 32 levels. This limitation makes
the generator a lot simpler without any practical impact. It also
protects against unbounded memory usage — particularly stack
overflows — by arbitrarily complex patterns. This means it’s
perfectly safe to generate names from untrusted, arbitrarily long
input patterns.
Here’s a usage example:
char name[64];
unsigned long seed = 0xb9584b61UL;
namegen(name, sizeof(name), "!sV'i (the |)!id", &seed);
/* name = "Engia'pin the Doltolph" */
The generator supports UTF-8, almost by accident. (I’d have to go out of
my way not to support it.)
Despite the lack of a compilation step, which requires the parsing the
template for each generated name, it’s an order of magnitude faster
than the C++ version, which caught me by surprise. The high
performance is due to name generation being a single pass over the
template using reservoir sampling.
Internally it maintains a stack of “reset” pointers, each pointing
into the output buffer where the current nesting level began its
output. Each time it hits an alternation (|
), it generates a random
number and decides whether or not to use the new option. The first
time it’s a 1/2 chance it chooses the new option. The second time, a
1/3 chance. The third time a 1/4 chance, and so on. When the new
option is selected, the reset pointer is used to “undo” any previous
output for the current nesting level.
The reservoir sampling means it needs to generate more random numbers
(once per option) than the JavaScript and C++ version (once per nesting
level). However, it uses its own, fast internal PRNG rather than
rand()
. Generating these random numbers is basically free.
Not using rand()
means that, like the previous libraries, it doesn’t
need anything from the standard library. It also has better quality
results since the typical standard library rand()
is total rubbish,
both in terms of speed and quality (and typically has a PLT
penalty). Finally it means the results are identical across all
platforms for the same template and seed, which is one reason it’s part
of the API.
Another slight performance boost comes from the representation of
pattern substitutions, i.e. i
will select a random “idiot” name from a
fixed selection of strings. The obvious representation is an array of
string pointers, as seen in the C++ version. However, there are a lot of
these little strings, which makes for a lot of pointers cluttering up
the relocation table. Instead, I packed it all into few small
pointerless tables, which on x86-64 are accessed efficiently via
RIP-relative addressing. It’s efficient, though not friendly to
modification.
I’m very happy with how this library turned out.
UTF-7 encoder and decoder
The last example is a UTF-7 encoder and decoder. UTF-7 is a method
for encoding arbitrary Unicode text within ASCII text, created as a
nasty hack to allow Unicode messages to be sent over ASCII-limited email
infrastructure. The gist of it is that the Unicode parts of a message
are encoded as UTF-16, then base64 encoded, then interpolated into the
ASCII stream between delimiters.
Einstein (allegedly) said “If you can’t explain it to a six year old,
you don’t understand it yourself.” The analog for programming is to
replace the six year old with a computer, and explaining an idea to a
computer is done by writing a program. I wanted to understand UTF-7, so
I implemented it.
A UTF-7 stream encoder and decoder in ANSI C
Here’s the entire API. It’s modeled a little after the zlib API.
/* utf7_encode() special code points */
#define UTF7_FLUSH -1L
/* return codes */
#define UTF7_OK -1
#define UTF7_FULL -2
#define UTF7_INCOMPLETE -3
#define UTF7_INVALID -4
struct utf7 {
char *buf;
size_t len;
/* then some "private" internal fields */
};
void utf7_init(struct utf7 *, const char *indirect);
int utf7_encode(struct utf7 *, long codepoint);
long utf7_decode(struct utf7 *);
Finally a library that defines a structure! The other fields (not shown)
hold important state information, but the application is only concerned
with buf
and len
: an input or output buffer. The same structure is
used for encoding and decoding, though only for one task at a time.
Following the minimalist library principle, there is no memory
allocation. When encoding a UTF-7 stream, the application’s job is to
point buf
to an output buffer, indicating its length with len
. Then
it feeds code points one at a time into the encoder. When the output is
full, it returns UTF7_FULL
. The application must provide a new buffer
and try again.
This example usage is more complicated than I anticipated it would be.
Properly pumping code points through the encoder requires a loop (or
at least a second attempt).
char buffer[1024];
struct utf7 ctx;
utf7_init(&ctx, 0);
ctx.buf = buffer;
ctx.len = sizeof(buffer));
/* Assumes "wide character" input is Unicode */
for (;;) {
wint_t c = fgetwc(stdin);
if (c == WEOF)
break;
while (utf7_encode(ctx, c) != UTF7_OK) {
/* Flush output and reset buffer */
fwrite(buffer, sizeof(buffer), 1, stdout);
ctx.buf = buffer;
ctx.len = sizeof(buffer));
}
}
/* Flush all pending output */
while (utf7_encode(ctx, UTF7_FLUSH) != UTF7_OK) {
fwrite(buffer, sizeof(buffer), 1, stdout);
ctx.buf = buffer;
ctx.len = sizeof(buffer));
}
/* Write remaining output */
fwrite(buffer, sizeof(buffer) - ctx.len, 1, stdout);
/* Check for errors */
if (fflush(stdout))
die("output error");
if (ferror(stdin))
die("input error");
Flushing (UTF7_FLUSH
) is necessary since, due to base64 encoding,
adjacent Unicode characters usually share a base64 character. Just
because a code point was absorbed into the encoder doesn’t mean it was
written into the output buffer. The encoding for that character may
depend on the next character to come. The special “flush” input forces
this out. It’s valid to flush in the middle of a stream, though this
may penalize encoding efficiency (e.g. the output may be
larger than necessary).
It’s not possible for the encoder to fail, so there are no error
conditions to worry about from the library.
Decoding is a different matter. It works almost in reverse from the
encoder: buf
points to the input and the decoder is pumped to
return one code point at a time. It returns one of:
-
A non-negative value: a valid code point (including ASCII).
-
UTF7_OK
: Input was exhausted. Stopping here would be valid. This is
what you should get when there’s no more input.
-
UTF7_INVALID
: The input was invalid. buf
points at the invalid byte.
-
UTF7_INCOMPLETE
: Input was exhausted, but more is expected. If
there is no more input, then the input must have been truncated,
which is an error.
So there are two possible errors for two kinds of invalid input.
Parsing errors are unavoidable when parsing input.
Again, this library accidentally doesn’t require the standard library.
It doesn’t even depend on the compiler’s locale being
compatible with ASCII since none of its internal tables use string or
character literals. It behaves exactly the same across all conforming
platforms.
More examples
I had a few more examples in mind, but this article has gone on long
enough.
Instead I’ll save these for other articles!