Arenas and the almighty concatenation operator

I continue to streamline an arena-based paradigm, and stumbled upon a concise technique for dynamic growth — an efficient, generic “concatenate anything to anything” within an arena built atop a core of 9-ish lines of code. The key insight originated from a reader suggestion about dynamic arrays. The subject of concatenation can be a string, dynamic array, or even something else. The “system” is extensible, and especially useful for path handling.

[]

Guidelines for computing sizes and subscripts

Occasionally we need to compute the size of an object that does not yet exist, or a subscript that may fall out of bounds. It’s easy to miss the edge cases where results overflow, creating a nasty, subtle bug, even in the presence of type safety. Ideally such computations happen in specialized code, such as inside an allocator (calloc, reallocarray) and not outside by the allocatee (i.e. malloc). Mitigations exist with different trade-offs: arbitrary precision, or using a wider fixed integer — i.e. 128-bit integers on 64-bit hosts. In the typical case, working only with fixed size-type integers, I’ve come up with a set of guidelines to avoid overflows in the edge cases.

[]

Speculations on arenas and custom strings in C++

My techniques with arena allocation and strings are oriented around C. I’m always looking for a better way, and lately I’ve been experimenting with building them using C++ features. What are the trade-offs? Are the benefits worth the costs? In this article I lay out my goals, review implementation possibilities, and discuss my findings. Following along will require familiarity with those previous two articles.

[]

Protecting paths in macro expansions by extending UTF-8

After a year I’ve finally came up with an elegant solution to a vexing u-config problem. The pkg-config format uses macros to generate build flags through recursive expansion. Some flags embed file system paths, but to the macro system it’s all strings. The output is also ultimately just one big string, which the receiving shell splits into fields. If a path contains spaces, or shell metacharacters, u-config must escape them so that shells treat them as part of a token. But how can u-config itself distinguish incidental spaces in paths from deliberate spaces between flags? What about other shell metacharacters in paths? My solution is to extend UTF-8 to encode metadata that survives macro expansion.

[]

An improved chkstk function on Windows

If you’ve spent much time developing with Mingw-w64 you’ve likely seen the symbol ___chkstk_ms, perhaps in an error message. It’s a little piece of runtime provided by GCC via libgcc which ensures enough of the stack is committed for the caller’s stack frame. The “function” uses a custom ABI and is implemented in assembly. So is the subject of this article, a slightly improved implementation soon to be included in w64devkit as libchkstk (-lchkstk).

[]

Two handy GDB breakpoint tricks

Over the past couple months I’ve discovered a couple of handy tricks for working with GDB breakpoints. I figured these out on my own, and I’ve not seen either discussed elsewhere, so I really ought to share them.

[]

So you want custom allocator support in your C library

This article was discussed on Hacker News and on reddit.

Users of mature C libraries conventionally get to choose how memory is allocated — that is, when it cannot be avoided entirely. The C standard never laid down a convention — perhaps for the better — so each library re-invents an allocator interface. Not all are created equal, and most repeat a few fundamental mistakes. Often the interface is merely a token effort, to check off that it’s “supported” without actual consideration to its use. This article describes the critical features of a practical allocator interface, and demonstrates why they’re important.

[]

My personal C coding style as of late 2023

This article was discussed on Hacker News and on reddit.

This has been a ground-breaking year for my C skills, and paradigm shifts in my technique has provoked me to reconsider my habits and coding style. It’s been my largest personal style change in years, so I’ve decided to take a snapshot of its current state and my reasoning. These changes have produced significant productive and organizational benefits, so while most is certainly subjective, it likely includes a few objective improvements. I’m not saying everyone should write C this way, and when I contribute code to a project I follow their local style. This is about what works well for me.

[]

A simple, arena-backed, generic dynamic array for C

Previously I presented an arena-friendly hash map applicable to any programming language where one might use arena allocation. In this third article I present a generic, arena-backed dynamic array. The details are specific to C, as the most appropriate mechanism depends on the language (e.g. templates, generics). Just as in the previous two articles, the goal is to demonstrate an idea so simple that a full implementation fits on one terminal pager screen — a concept rather than a library.

[]

An easy-to-implement, arena-friendly hash map

My last article had tips for for arena allocation. This next article demonstrates a technique for building bespoke hash maps that compose nicely with arena allocation. In addition, they’re fast, simple, and automatically scale to any problem that could reasonably be solved with an in-memory hash map. To avoid resizing — both to better support arenas and to simplify implementation — they have slightly above average memory requirements. The design, which we’re calling a hash-trie, is the result of fruitful collaboration with NRK, whose sibling article includes benchmarks. It’s my new favorite data structure, and has proven incredibly useful. With a couple well-placed acquire/release atomics, we can even turn it into a lock-free concurrent hash map.

[]

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)