How to build a WaitGroup from a 32-bit integer

Go has a nifty synchronization utility called a WaitGroup, on which one or more goroutines can wait for concurrent task completion. In other languages, the usual task completion convention is joining threads doing the work. In Go, goroutines aren’t values and lack handles, so a WaitGroup replaces joins. Building a WaitGroup using typical, portable primitives is a messy affair involving constructors and destructors, managing lifetimes. However, on at least Linux and Windows, we can build a WaitGroup out of a zero-initialized integer, much like my 32-bit queue and 32-bit barrier.

[]

Illuminating synchronization edges for ThreadSanitizer

Sanitizers are powerful development tools which complement debuggers and fuzzing. I typically have at least one sanitizer active during development. They’re particularly useful during code review, where they can identify issues before I’ve even begun examining the code carefully — sometimes in mere minutes under fuzzing. Accordingly, it’s a good idea to have your own code in good agreement with sanitizers before review. For ThreadSanitizer (TSan), that means dealing with false positives in programs relying on synchronization invisible to TSan.

[]

The quick and practical "MSI" hash table

I generally prefer C, so I’m accustomed to building whatever I need on the fly, such as heaps, linked lists, and especially hash tables. Few programs use more than a small subset of a data structure’s features, making their implementation smaller, simpler, and more efficient than the general case, which must handle every edge case. A typical hash table tutorial will describe a relatively lengthy program, but in practice, bespoke hash tables are only a few lines of code. Over the years I’ve worked out some basic principles for hash table construction that aid in quick and efficient implementation. This article covers the technique and philosophy behind what I’ve come to call the “mask-step-index” (MSI) hash table, which is my standard approach.

[]

My new debugbreak command

I previously mentioned the Windows feature where pressing F12 in a debuggee window causes it to break in the debugger. It works with any debugger — GDB, RemedyBG, Visual Studio, etc. — since the hotkey simply raises a breakpoint structured exception. It’s been surprisingly useful, and I’ve wanted it available in more contexts, such as console programs or even on Linux. The result is a new debugbreak command, now included in w64devkit. Though, of course, you already have everything you need to build it and try it out right now. I’ve also worked out a Linux implementation.

[]

Assertions should be more debugger-oriented

Prompted by a 20 minute video, over the past month I’ve improved my debugger skills. I’d shamefully acquired a bad habit: avoiding a debugger until exhausting dumber, insufficient methods. My first choice should be a debugger, but I had allowed a bit of friction to dissuade me. With some thoughtful practice and deliberate effort clearing the path, my bad habit is finally broken — at least when a good debugger is available. It feels like I’ve leveled up and, like touch typing, this was a skill I’d neglected far too long. One friction point was the less-than-optimal assert feature in basically every programming language implementation. It ought to work better with debuggers.

[]

My take on "where's all the code"

This article was discussed on Lobsters.

Earlier this month Ted Unangst researched compiling the OpenBSD kernel 50% faster, which involved stubbing out the largest, extraneous branches of the source tree. To find the lowest-hanging fruit, he wrote a tool called watcwhere’s all the code — that displays an interactive “usage” summary of a source tree oriented around line count. A followup post about exploring the tree in parallel got me thinking about the problem, especially since I had just written about a concurrent queue. Turning it over in my mind, I saw opportunities for interesting data structures and memory management, and so I wanted to write my own version of the tool, watc.c, which is the subject of this article.

[]

A lock-free, concurrent, generic queue in 32 bits

This article was discussed on Hacker News.

While considering concurrent queue design I came up with a generic, lock-free queue that fits in a 32-bit integer. The queue is “generic” in that a single implementation supports elements of any arbitrary type, despite an implementation in C. It’s lock-free in that there is guaranteed system-wide progress. It can store up to 32,767 elements at a time — more than enough for message queues, which must always be bounded. I will first present a single-consumer, single-producer queue, then expand support to multiple consumers at a cost. Like my lightweight barrier, I’m not presenting this as a packaged solution, but rather as a technique you can apply when circumstances call.

[]

Luhn algorithm using SWAR and SIMD

Ever been so successful that credit card processing was your bottleneck? Perhaps you’ve wondered, “If only I could compute check digits three times faster using the same hardware!” Me neither. But if that ever happens someday, then this article is for you. I will show how to compute the Luhn algorithm in parallel using SIMD within a register, or SWAR.

[]

A flexible, lightweight, spin-lock barrier

This article was discussed on Hacker News.

The other day I wanted try the famous memory reordering experiment for myself. It’s the double-slit experiment of concurrency, where a program can observe an “impossible” result on common hardware, as though a thread had time-traveled. While getting thread timing as tight as possible, I designed a possibly-novel thread barrier. It’s purely spin-locked, the entire footprint is a zero-initialized integer, it automatically resets, it can be used across processes, and the entire implementation is just three to four lines of code.

[]

Compressing and embedding a Wordle word list

Wordle is all the rage, resulting in an explosion of hobbyist clones, with new ones appearing every day. At the current rate I estimate by the end of 2022 that 99% of all new software releases will be Wordle clones. That’s no surprise since the rules are simple, it’s more fun to implement and study than to actually play, and the hard part is building a decent user interface. Such implementations go back at least 30 years. Implementers get to decide on a platform, language, and the particular subject of this article: how to handle the word list. Is it a separate file/database or embedded in the program? If embedded, is it worth compressing? In this article I’ll present a simple, tailored Wordle list compression strategy that beats general purpose compressors.

[]

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)