So you want custom allocator support in your C library

December 17, 2023

nullprogram.com/blog/2023/12/17/

This article was discussed on Hacker News and on reddit.

Users of mature C libraries conventionally get to choose how memory is allocated — that is, when it cannot be avoided entirely. The C standard never laid down a convention — perhaps for the better — so each library re-invents an allocator interface. Not all are created equal, and most repeat a few fundamental mistakes. Often the interface is merely a token effort, to check off that it’s “supported” without actual consideration to its use. This article describes the critical features of a practical allocator interface, and demonstrates why they’re important.

Before diving into the details, here’s the checklist for library authors:

All allocation functions accept a user-defined context pointer.
The “free” function accepts the original allocation size.
The “realloc” function accepts both old and new size.

Context pointer

The standard library allocator keeps its state in global variables. This makes for a simple interface, but comes with significant performance and complexity costs. These costs likely motivate custom allocator use in the first place, in which case slavishly duplicating the standard interface is essentially the worst possible option. Unfortunately this is typical:

#define LIB_MALLOC  malloc
#define LIB_FREE    free

I could observe the library’s allocations, and I could swap in a library functionality equivalent to the standard library allocator — jemalloc, mimalloc, etc. — but that’s about it. Better than nothing, I suppose, but only just so. Function pointer callbacks are slightly better:

typedef struct {
    void *(*malloc)(size_t);
    void  (*free)(void *);
} allocator;

session *session_new(..., allocator);

At least I could use different allocators at different times, and there are even tricks to bind a context pointer to the callback. It also works when the library is dynamically linked.

Either case barely qualifies as custom allocator support, and they’re useless when it matters most. Only a small ingredient is needed to make these interfaces useful: a context pointer.

// NOTE: Better, but still not great
typedef struct {
    void *(*malloc)(size_t, void *ctx);
    void  (*free)(void *, void *ctx);
    void   *ctx;
} allocator;

Users can choose from where the library will allocate at at given time. It liberates the allocator from global variables (or janky workarounds), and multithreading woes. The default can still hook up to the standard library through stubs that fit these interfaces.

static void *lib_malloc(size_t size, void *ctx)
{
    (void)ctx;
    return malloc(size);
}

static void *lib_free(void *ptr, void *ctx)
{
    (void)ctx;
    free(ptr);
}

static allocator lib_allocator = {lib_malloc, lib_free, 0};

Note that the context pointer came after the “standard” arguments. All things being equal, “extra” arguments should go after standard ones. But don’t sweat it! In the most common calling conventions this allows stub implementations to be merely an unconditional jump. It’s as though the stubs are a kind of subtype of the original functions.

lib_malloc:
        jmp malloc
lib_free:
        jmp free

Typically the decision is completely arbitrary, and so this minutia tips the balance.

Context pointer example

So what’s the big deal? It means we can trivially plug in, say, a tiny arena allocator. To demonstrate, consider this fictional string set and partial JSON API, each of which supports a custom allocator. For simplicity — I’m attempting to balance substance and brevity — they share an allocator interface. (Note: Because subscripts and sizes should be signed, and we’re now breaking away from the standard library allocator, I will use ptrdiff_t for the rest of the examples.)

typedef struct {
    void *(*malloc)(ptrdiff_t, void *ctx);
    void  (*free)(void *, void *ctx);
    void   *ctx;
} allocator;

typedef struct set set;
set  *set_new(allocator *);
set  *set_free(set *);
bool  set_add(set *, char *);

typedef struct json json;
json     *json_load(char *buf, ptrdiff_t len, allocator *);
json     *json_free(json *);
ptrdiff_t json_length(json *);
json     *json_subscript(json *, ptrdiff_t i);
json     *json_getfield(json *, char *field);
double    json_getnumber(json *);
char     *json_getstring(json *);

set and json objects retain a copy of the allocator object for all allocations made through that object. Given nothing, they default to the standard library using the pass-through definitions above. Used together with the standard library allocator:

typedef struct {
    double sum;
    bool   ok;
} sum_result;

sum_result sum_unique(char *json, ptrdiff_t len)
{
    sum_result r = {0};
    json *namevals = json_load(json, len, 0);
    if (!namevals) {
        return r;  // parse error
    }

    ptrdiff_t arraylen = json_length(namevals);
    if (arraylen < 0) {
        json_free(namevals);
        return r;  // not an array
    }

    set *seen = set_new(0);
    for (ptrdiff_t i = 0; i < arraylen; i++) {
        json *element = json_subscript(namevals, i);
        char *name    = json_getfield(element, "name");
        char *value   = json_getfield(element, "value");
        if (!name || !value) {
            set_free(set);
            json_free(namevals);
            return r;  // invalid element
        } else if (set_add(set, name)) {
            r.sum += json_getnumber(value);
        }
    }

    set_free(set);
    json_free(namevals);
    r.ok = 1;
    return r;
}

Which given as JSON input:

[
    {"name": "foo", "value":  123},
    {"name": "bar", "value":  456},
    {"name": "foo", "value": 1000}
]

Would return 579.0. Because it’s using standard library allocation, it must carefully clean up before returning. There’s also no out-of-memory handling because, in practice, programs typically do not get to observe and respond to the standard allocator running out of memory.

We can improve and simplify it with an arena allocator:

typedef struct {
    char    *beg;
    char    *end;
    jmp_buf *oom;
} arena;

void *arena_malloc(ptrdiff_t size, void *ctx)
{
    arena *a = ctx;
    ptrdiff_t available = a->end - a->beg;
    ptrdiff_t alignment = -size & 15;
    if (size > available-alignment) {
        longjmp(*a->oom);
    }
    return a->end -= size + alignment;
}

void arena_free(void *ptr, void *ctx)
{
    // nothing to do (yet!)
}

I’m allocating from the end rather than the beginning because it will make a later change simpler. Applying that to the function:

sum_result sum_unique(char *json, ptrdiff_t len, arena scratch)
{
    sum_result r = {0};

    allocator a = {0};
    a.malloc = arena_malloc;
    a.free = arena_free;
    a.ctx = &scratch;

    json *namevals = json_load(json, len, &a);
    if (!namevals) {
        return r;  // parse error
    }

    ptrdiff_t arraylen = json_length(namevals);
    if (arraylen < 0) {
        return r;  // not an array
    }

    set *seen = set_new(&a);
    for (ptrdiff_t i = 0; i < arraylen; i++) {
        json *element = json_subscript(namevals, i);
        char *name    = json_getfield(element, "name");
        char *value   = json_getfield(element, "value");
        if (!name || !value) {
            return r;  // invalid element
        } else if (set_add(set, name)) {
            r.sum += json_getnumber(value);
        }
    }
    r.ok = 1;
    return r;
}

Calls to set_free and json_free are no longer necessary because the arena automatically frees these on any return, in O(1). I almost feel bad the library authors bothered to write them! It also handles allocation failure without introducing it to sum_unique. We may even deliberately restrict the memory available to this function — perhaps because the input is untrusted, and we want to quickly abort denial-of-service attacks — by giving it a small arena, relying on out-of-memory to reject pathological inputs.

There are so many possibilities unlocked by the context pointer.

Provide the original allocation size when freeing

When an application frees an object it always has the original, requested allocation size on hand. After all, it’s a necessary condition to use the object correctly. In the simplest case it’s the size of the freed object’s type: a static quantity. If it’s an array, then it’s a multiple of the tracked capacity: a dynamic quantity. In any case the size is either known statically or tracked dynamically by the application.

Yet free() does not accept a size, meaning that the allocator must track the information redundantly! That’s a needless burden on custom allocators, and with a bit of care a library can lift it.

This was noticed in C++, and WG21 added sized deallocation in C++14. It’s now the default on two of the three major implementations (and probably not the two you’d guess). In other words, object size is so readily available that it can mostly be automated away. Notable exception: operator new[] and operator delete[] with trivial destructors. With non-trivial destructors, operator new[] must track the array length for its its own purposes on top of libc bookkeeping. In other words, array allocations have their size stored in at least three different places!

That means the “free” interface should look like this:

void *lib_free(void *ptr, ptrdiff_t len, void *ctx);

And calls inside the library might look like:

lib_free(p, sizeof(*p), ctx);
lib_free(a, sizeof(*a)*len, ctx);

Now that arena_free has size information, it can free an allocation if it was the most recent:

void arena_free(void *ptr, ptrdiff_t size, void *ctx)
{
    arena *a = ctx;
    if (ptr == a->end) {
        ptrdiff_t alignment = -size & 15;
        a->end += size + alignment;
    }
}

If the library allocates short-lived objects to compute some value, then discards in reverse order, the memory can be reused. The arena doesn’t have to do anything special. The library merely needs to share its knowledge with the allocator.

Beyond arena allocation, an allocator could use the size to locate the allocation’s size class and, say, push it onto a freelist of its size class. Size-class freelists compose well with arenas, and an implementation is short and simple when the caller of “free” communicates object size.

Another idea: During testing, use a debug allocator that tracks object size and validates the reported size against its own bookkeeping. This can help catch mistakes sooner.

Provide the old size when resizing an allocation

Resizing an allocation requires a lot from an allocator, and it should be avoided if possible. At the very least it cannot be done at all without knowing the original allocation size. An allocator can’t simply no-op it like it can with “free.” With the standard library interface, allocators have no choice but to redundantly track object sizes when “realloc” is required.

So, just as with “free,” the allocator should be given the old object size!

void *lib_realloc(void *ptr, ptrdiff_t old, ptrdiff_t new, void *ctx);

At the very least, an allocator could implement “realloc” with “malloc” and memcpy:

void arena_realloc(void *ptr, ptrdiff_t old, ptrdiff_t new, void *ctx)
{
    assert(new > old);
    void *r = arena_malloc(new, ctx);
    return memcpy(r, ptr, old);
}

Of the three checklist items, this is the most neglected. Exercise for the reader: The last-allocated object can be resized in place, instead using memmove. If this is frequently expected, allocate from the front, adjust arena_free as needed, and extend the allocation in place as discussed a previous addendum, without any copying.

Real world examples

Let’s examine real world examples to see how well they fit the checklist. First up is uthash, a popular, easy-to-use, intrusive hash table:

#define uthash_malloc(sz) my_malloc(sz)
#define uthash_free(ptr, sz) my_free(ptr)

No “realloc” so it trivially checks (3). It optionally provides the old size to “free” which checks (2). However it misses (1) which is the most important, greatly limiting its usefulness.

Next is the venerable zlib. It has function pointers with these prototypes on its z_stream object.

void *zlib_malloc(void *ctx, unsigned items, unsigned size);
void  zlib_free(void *ctx, void *ptr);

The context pointer checks (1), and I can confirm from experience that it’s genuinely useful with a custom allocator. No “realloc” so it passes (3) automatically. It misses (2), but in practice this hardly matters: It allocates everything up front, and frees at the very end, meaning a no-op “free” is quite sufficient.

Finally there’s the Lua programming language with this economical, single-function interface:

void *lua_Alloc(void *ctx, void *ptr, size_t old, size_t new);

It packs all three allocator functions into one function. It includes a context pointer (1), a free size (2), and two realloc sizes (3). It’s a simple allocator’s best friend!

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.