Giving C++ std::regex a C makeover

Suppose you’re working in C using one of the major toolchains — that is, it’s mainly a C++ implementation — and you need regular expressions. You could integrate a library, but there’s a regex implementation in the C++ standard library included with your compiler, just within reach. As a resourceful engineer, using an asset already in hand seems prudent. But it’s a C++ interface, and you’re using C instead of C++ for a reason, perhaps to avoid dealing with C++. Have no worries. This article is about wrapping std::regex in a tidy C interface which not only hides all the C++ machinery, but utterly tames it. It’s not so much practical as a potpourri of interesting techniques.

[]

Deep list copy: More than meets the eye

I recently came across a take-home C programming test which had more depth and complexity than I suspect the interviewer intended. While considering it, I also came up with a novel, or at least unconventional, solution. The problem is to deep copy a linked list where each node references a random list element in addition to usual linkage — similar to LeetCode problem 138. This reference is one of identity rather than value, which has murky consequences.

[]

Symbol inspection tools for w64devkit: vc++filt and peports

I introduced two new tools to w64devkit, vc++filt and peports (pronounced like purports), which aid manual symbol inspection and complement one another. As of this writing, the latter is not yet in a release, but it’s feature-complete and trivial to build if you wanted to try it out early. This article explains the motivation and purpose for each.

[]

Arenas and the almighty concatenation operator

I continue to streamline an arena-based paradigm, and stumbled upon a concise technique for dynamic growth — an efficient, generic “concatenate anything to anything” within an arena built atop a core of 9-ish lines of code. The key insight originated from a reader suggestion about dynamic arrays. The subject of concatenation can be a string, dynamic array, or even something else. The “system” is extensible, and especially useful for path handling.

[]

Guidelines for computing sizes and subscripts

Occasionally we need to compute the size of an object that does not yet exist, or a subscript that may fall out of bounds. It’s easy to miss the edge cases where results overflow, creating a nasty, subtle bug, even in the presence of type safety. Ideally such computations happen in specialized code, such as inside an allocator (calloc, reallocarray) and not outside by the allocatee (i.e. malloc). Mitigations exist with different trade-offs: arbitrary precision, or using a wider fixed integer — i.e. 128-bit integers on 64-bit hosts. In the typical case, working only with fixed size-type integers, I’ve come up with a set of guidelines to avoid overflows in the edge cases.

[]

Speculations on arenas and custom strings in C++

My techniques with arena allocation and strings are oriented around C. I’m always looking for a better way, and lately I’ve been experimenting with building them using C++ features. What are the trade-offs? Are the benefits worth the costs? In this article I lay out my goals, review implementation possibilities, and discuss my findings. Following along will require familiarity with those previous two articles.

[]

Protecting paths in macro expansions by extending UTF-8

After a year I’ve finally came up with an elegant solution to a vexing u-config problem. The pkg-config format uses macros to generate build flags through recursive expansion. Some flags embed file system paths, but to the macro system it’s all strings. The output is also ultimately just one big string, which the receiving shell splits into fields. If a path contains spaces, or shell metacharacters, u-config must escape them so that shells treat them as part of a token. But how can u-config itself distinguish incidental spaces in paths from deliberate spaces between flags? What about other shell metacharacters in paths? My solution is to extend UTF-8 to encode metadata that survives macro expansion.

[]

An improved chkstk function on Windows

If you’ve spent much time developing with Mingw-w64 you’ve likely seen the symbol ___chkstk_ms, perhaps in an error message. It’s a little piece of runtime provided by GCC via libgcc which ensures enough of the stack is committed for the caller’s stack frame. The “function” uses a custom ABI and is implemented in assembly. So is the subject of this article, a slightly improved implementation soon to be included in w64devkit as libchkstk (-lchkstk).

[]

Two handy GDB breakpoint tricks

Over the past couple months I’ve discovered a couple of handy tricks for working with GDB breakpoints. I figured these out on my own, and I’ve not seen either discussed elsewhere, so I really ought to share them.

[]

So you want custom allocator support in your C library

This article was discussed on Hacker News and on reddit.

Users of mature C libraries conventionally get to choose how memory is allocated — that is, when it cannot be avoided entirely. The C standard never laid down a convention — perhaps for the better — so each library re-invents an allocator interface. Not all are created equal, and most repeat a few fundamental mistakes. Often the interface is merely a token effort, to check off that it’s “supported” without actual consideration to its use. This article describes the critical features of a practical allocator interface, and demonstrates why they’re important.

[]

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)