Slim Reader/Writer Locks are neato

I’m 18 years late, but Slim Reader/Writer Locks have a fantastic interface: pointer-sized (“slim”), zero-initialized, and non-allocating. Lacking cleanup, they compose naturally with arena allocation. Sounds like a futex? That’s because they’re built on futexes introduced at the same time. They’re also complemented by condition variables with the same desirable properties. My only quibble is that slim locks could easily have been 32-bit objects, but it hardly matters. This article, while treating Win32 as a foreign interface, discusses a paper-thin C++ wrapper interface around lock and condition variables, in my own style.

If you’d like to see/try a complete, working demonstration before diving into the details: demo.cpp. We’re going to build this from the ground up, so let’s establish a few primitive integer definitions:

using b32 = signed;
using i32 = signed;
using uz  = decltype(0uz);

Think of uz as like uintptr_t. This implementation will support both 32-bit and 64-bit targets, and we’ll need it as the basis for locks and condition variables:

enum Lock : uz;
enum Cond : uz;

Opaque enums provide additional type safety: They have the properties of an integer, including trivial destruction, but are distinct types which compilers forbid mixing with other integers. We can’t, say, accidentally cross condition variable and lock parameters — my main concern. Aside from zero-initialization, we do not actually care about the values of these variables, so enumerators are unnecessary. (Caveat: GDB cannot display opaque enums, which is slightly irritating.)

The documentation doesn’t explicitly mention zero initialization, but the official *_INIT constants are defined as zero. That locks in zero at the ABI level, so we can count on it.

All the functions we’ll need are exported by kernel32.dll. Locks have two variations on lock/unlock: “exclusive” (write) and “shared” (read). There are also “try” versions, but I won’t be using them.

#define W32(r, p) extern "C" __declspec(dllimport) r __stdcall p noexcept
W32(void, AcquireSRWLockExclusive(Lock *));
W32(void, AcquireSRWLockShared(Lock *));
W32(void, ReleaseSRWLockExclusive(Lock *));
W32(void, ReleaseSRWLockShared(Lock *));

Declaring Win32 functions in C++ is a mouthful, and everything must be written in just the right order, but it’s mostly tucked away in a macro. Usually there’s a stack discipline to these locks, so an RAII scoped guard is in order:

struct Guard {
    Lock *l;
    Guard(Lock *l) : l{l} { AcquireSRWLockExclusive(l); }
    ~Guard()              { ReleaseSRWLockExclusive(l); }
};

struct RGuard {
    Lock *l;
    RGuard(Lock *l) : l{l} { AcquireSRWLockShared(l); }
    ~RGuard()              { ReleaseSRWLockShared(l); }
};

Dead simple. (What about rule of three? Instead of working around this language design flaw, reach into the distant future where it’s been fixed: -Werror=deprecated-copy-dtor.) Usage might look like:

struct Example {
    Lock lock = {};
    i32  value;
};

i32 incr(Example *e)
{
    Guard g(&e->lock);
    return ++e->value;
}

Note the = {} to guarantee the lock is always ready for use. It gets more interesting with condition variables in the mix. That’s three more functions:

W32(b32,  SleepConditionVariableSRW(Cond *, Lock *, i32, b32));
W32(void, WakeAllConditionVariable(Cond *));
W32(void, WakeConditionVariable(Cond *));

The last parameter on SleepConditionVariableSRW indicates if the lock was acquired shared. Why do locks have distinct acquire and release functions while condition variables use a flag for the same purpose? Beats me. I’ll unfold it into two functions, selected by type, with a default infinite wait:

b32 wait(Cond *c, Guard *g, i32 ms = -1)
{
    return SleepConditionVariableSRW(c, g->l, ms, 0);
}

b32 wait(Cond *c, RGuard *g, i32 ms = -1)
{
    return SleepConditionVariableSRW(c, g->l, ms, 1);
}

Usage might look like:

for (RGuard g(&lock); remaining;) {
    wait(&done, &g);
}

The other side is nothing more than a rename (but could also be accomplished through linking):

void signal(Cond *c)
{
    WakeConditionVariable(c);
}

void broadcast(Cond *c)
{
    WakeAllConditionVariable(c);
}

And a couple examples of its usage:

if (Guard g(&lock); !--remaining) {
    signal(&done);
}

// Or:

Guard g(&lock);
ready = true;
broadcast(&init);
while (remaining) {
    wait(&done, &g);
}

A satisfying, powerful synchronization interface with hardly any code!

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)