nullprogram.com/blog/2024/04/14/
My techniques with arena allocation and strings are
oriented around C. I’m always looking for a better way, and lately I’ve
been experimenting with building them using C++ features. What are the
trade-offs? Are the benefits worth the costs? In this article I lay out my
goals, review implementation possibilities, and discuss my findings.
Following along will require familiarity with those previous two articles.
Some of C++ is beyond my mental capabilities, and so I cannot wield those
parts effectively. Other parts I can wrap my head around, but it
requires substantial effort and the inevitable mistakes are difficult to
debug. So a general goal is to minimize contact with that complexity, only
touching a few higher-value features that I can use confidently.
Existing practice is unimportant. I’ve seen where that goes. Like the C
standard library, the C++ standard library offers me little. Its
concepts regarding ownership and memory management are irreconcilable
(move semantics, smart pointers, etc.), so I have to build from scratch
anyway. So absolutely no including C++ headers. The most valuable features
are built right into the language, so I won’t need to include library
definitions.
No public
or private
. Still no const
beyond what is required
to access certain features. This means I can toss out a bunch of keywords
like class
, friend
, etc. It eliminates noisy, repetitive code and
interfaces — getters, setters, separate const
and non-const
— which in
my experience means fewer defects.
No references beyond mandatory cases. References hide addresses being
taken — or merely implies it, when it’s actually an expensive copy — which
is an annoying experience when reading unfamiliar C++. After all, for
arenas the explicit address-taking (permanent) or copying (scratch) is a
critical part of communicating the interfaces.
In theory constexpr
could be useful, but it keeps falling short when I
try it out, so I’m ignoring it. I’ll elaborate in a moment.
Minimal template use. They blow up compile times and code size, they’re
noisy, and in practice they make debug builds (i.e. -O0
) much slower
(typically ~10x) because there’s no optimization to clean up the mess.
I’ll only use them for a few foundational purposes, such as allocation.
(Though this article is about the fundamental stuff.)
No methods aside from limited use of operator overloads. I want to keep a
C style, plus methods just look ugly without references: obj->func()
vs.
func(obj)
. (Why are we still writing ->
in the 21st century?) Function
overloading can instead differentiate “methods.” Overloads are acceptable
in moderation, especially because I’m paying for it (symbol decoration)
whether or not I take advantage.
Finally, no exceptions of course. I assume -fno-exceptions
, or the local
equivalent, is active.
Allocation
Let’s start with allocation. Since writing that previous article, I’ve
streamlined arena allocation in C:
#define new(a, t, n) (t *)alloc(a, sizeof(t), _Alignof(t), n)
typedef struct {
byte *beg;
byte *end;
} arena;
static byte *alloc(arena *a, size objsize, size align, size count)
{
assert(count >= 0);
size pad = (uptr)a->end & (align - 1);
assert(count < (a->end - a->beg - pad)/objsize); // oom
return memset(a->end -= objsize*count + pad, 0, objsize*count);
}
(As needed, replace the second assert
with whatever out of memory policy
is appropriate.) Then allocating, say, a 10k-element hash table
(i.e. to keep it off the stack):
i16 *seen = new(&scratch, i16, 1<<14);
With C++, I initially tried placement new with the arena as the
“place” for the allocation:
void *operator new(size_t, arena *); // avoid this
Then to create a single object:
object *o = new (&scratch) object{};
This exposes the constructor, but everything else about it is poor. It
relies on complex, finicky rules governing new
overloads, especially for
alignment handling. It’s difficult to tell what’s happening, and it’s too
easy to make mistakes that compile. That doesn’t even count the mess that
is array new[]
.
I soon learned it’s better to replace the new
macro with a template,
which can actually see what it’s doing. I can’t call it new
in C++, so I
settled on make
instead:
template<typename T>
static T *make(arena *a, size count = 1)
{
assert(count >= 0);
size objsize = sizeof(T);
size align = alignof(T);
size pad = (uptr)a->end & (align - 1);
assert(count < (a->end - a->beg - pad)/objsize); // oom
a->end -= objsize*count + pad;
T *r = (T *)a->end;
for (size i = 0; i < count; i++) {
new ((void *)&r[i]) T{};
}
return r;
}
Then allocating that hash table becomes:
i16 *seen = make<i16>(&scratch, 10000);
Or a single object, relying on the default argument:
object *o = make<object>(&scratch);
Due to placement new, merely for invoking the constructor, these objects
aren’t just zero-initialized, but value-initialized. It can only construct
objects that define an empty initializer, but in exchange unlocks some
interesting possibilities:
struct mat3 {
f32 data[9] = {
1, 0, 0,
0, 1, 0,
0, 0, 1,
};
};
struct list {
node *head = 0;
node **tail = &head;
};
When a zero-initialized state isn’t ideal, objects can still initialize to
a more useful state straight out of the arena. The second case is even
self-referencing, which is specifically supported through placement new.
Otherwise you’d need a special-written copy or move constructor.
make
could accept constructor arguments and perfect forward them to a
constructor. However, that’s too far into the dark arts for my comfort,
plus it requires a correct definition of std::forward
. In practice that
means #include
-ing it, and whatever comes in with it. Or ask an expert
capable of writing such a definition from scratch, though both are
probably too busy.
Update 1: One of those experts, Jonathan Müller, kindly reached out to
say that a static cast is sufficient. This is easy to do:
template<typename T, typename ...A>
static T *make(arena *a, size count = 1, A &&...args)
{
// ...
new ((void *)&r[i]) T{(A &&)args...};
// ...
}
Update 2: I later realized that because I do not care about copy or
move semantics, I also don’t care about perfect forwarding. I can simply
expand the parameter pack without casting or &&
. I also don’t want the
extra restrictions on braced initializer conversions, so better to use
parentheses with new
.
template<typename T, typename ...A>
static T *make(arena *a, size count = 1, A ...args)
{
// ...
new ((void *)&r[i]) T(args...);
// ...
}
One small gotcha: placement new doesn’t work out of the box, and you need
to provide a definition. That means including <new>
or writing one out.
Fortunately it’s trivial, but the prototype must exactly match, including
size_t
:
void *operator new(size_t, void *p) { return p; }
Overall I feel the template is a small improvement over the macro.
Strings
Recall my basic C string type, with a macro to wrap literals:
#define countof(a) (size)(sizeof(a) / sizeof(*(a)))
#define s8(s) (s8){(u8 *)s, countof(s)-1}
typedef struct {
u8 *data;
size len;
} s8;
Since it doesn’t own the underlying buffer — region-based allocation has
already solved the ownership problem — this is what C++ long-windedly
calls a std::string_view
. In C++ we won’t need the countof
macro for
strings, but it’s still generally useful. Converting it to a template,
which is theoretically more robust (rejects pointers), but comes with a
non-zero cost:
template<typename T, size N>
size countof(T (&)[N])
{
return N;
}
The reference — here a reference to an array — is unavoidable, so it’s one
of the rare cases. The same concept applies as an s8
constructor to
replace the macro:
struct s8 {
u8 *data = 0;
size len = 0;
s8() = default;
template<size N>
s8(const char (&s)[N]) : data{(u8 *)s}, len{N-1} {}
};
I’ve explicitly asked to keep a default zero-initialized (empty) string
since it’s useful — and necessary to directly allocate strings using
make
, e.g. an array of strings. const
is required because string
literals are const
in C++, but it’s immediately stripped off for the
sake of simplicity. The new constructor allows:
Or even more usefully:
void print(bufout *, s8);
// ...
print(stdout, "hello world\n");
Define operator==
and it’s more useful yet:
b32 operator==(s8 s)
{
return len==s.len && (!len || !memcmp(data, s.data, len));
}
Now this works, and it’s cheap and fast even in debug builds:
s8 key = ...;
if (key == "HOME") {
// ...
}
That’s more ergonomic than the macro and comparison function. operator[]
also improves ergonomics, to subscript a string without going through the
data
member:
u8 &operator[](size i)
{
assert(i >= 0);
assert(i < len);
return data[i];
}
The reference is again necessary to make subscripts assignable. Since
s8span
— make a string spanning two pointers — so often appears in my
programs, a constructor seems appropriate, too:
s8(u8 *beg, u8 *end)
{
assert(beg <= end);
data = beg;
len = end - beg;
}
By the way, these assertions I’ve been using are great for catching
mistakes quickly and early, and they complement fuzz testing.
I’m not sold on it, but an idea for the future: C++23’s multi-index
operator[]
as a slice operator:
s8 operator[](size beg, size end)
{
assert(beg >= 0);
assert(beg <= end);
assert(end <= len);
return {data+beg, data+end};
}
Then:
s8 msg = "foo bar baz";
msg = msg[4,7]; // msg = "bar"
I could keep going with, say, iterators and such, but each will be more
specialized and less useful. (I don’t care about range-based for
loops.)
Downside: static initialization
The new string stuff is neat, but I hit a wall trying it out: These fancy
constructors do not reliably construct at compile time, not even with a
constexpr
qualifier in two of the three major C++ implementations. A
static lookup table that contains a string is likely constructed at run
time in at least some builds. For example, this table:
static s8 keys[] = {"foo", "bar", "baz"};
Requires run-time construction in real world cases I care about, requiring
C++ magic and linking runtime gunk. The constructor is therefore a strict
downgrade from the macro, which works perfectly in these lookup tables.
Once a non-default constructor is defined, I’ve been unable to find an
escape hatch back to the original, dumb, reliable behavior.
Update: Jonathan Müller points out the reinterpret cast is forbidden
in a constexpr
function, so it’s not required to happen at compile time.
After some thought, I’ve figured out a workaround using a union:
struct s8 {
union {
u8 *data = 0;
const char *cdata;
};
size len = 0;
template<size N>
constexpr s8(const char (&s)[N]) : cdata{s}, len{N-1} {}
// ...
}
In all three C++ implementations, in all configurations, this reliably
constructs strings at compile time. The other semantics are unchanged.
Other features
Having a generic dynamic array would be handy, and more ergonomic than my
dynamic array macro:
template<typename T>
struct slice {
T *data = 0;
size len = 0;
size cap = 0;
slice<T> = default;
template<size N>
slice<T>(T (&a)[N]) : data{a}, len{N}, cap{N} {}
T &operator[](size i) { ... }
}
template<typename T>
slice<T> append(arena *, slice<T>, T);
On the other hand, hash maps are mostly solved, so I wouldn’t
bother with a generic map.
Function overloads would simplify naming. For example, this in C:
prints8(bufout *, s8);
printi32(bufout *, i32);
printf64(bufout *, f64);
printvec3(bufout *, vec3);
Would hide that stuff behind the scenes in the symbol decoration:
print(bufout *, s8);
print(bufout *, i32);
print(bufout *, f64);
print(bufout *, vec3);
Same goes for a hash()
function on different types.
C++ has better null pointer semantics than C. Addition or subtraction of
zero with a null pointer produces a null pointer, and subtracting null
pointers results in zero. This eliminates some boneheaded special case
checks required in C, though not all: memcpy
, for instance, arbitrarily
still does not accept null pointers even in C++.
Ultimately worth it?
The static data problem is a real bummer, but perhaps it’s worth it for
the other features. I still need to put it all to the test in a real,
sizable project.