A more robust raw OpenBSD syscall demo

Ted Unangst published dude, where are your syscalls? on flak yesterday, with a neat demonstration of OpenBSD’s pinsyscall security feature, whereby only pre-registered addresses are allowed to make system calls. Whether it strengthens or weakens security is up for debate, but regardless it’s an interesting, low-level programming challenge. The original demo is fragile for multiple reasons, and requires manually locating and entering addresses for each build. In this article I show how to fix it. To prove that it’s robust, I ported an entire, real application to use raw system calls on OpenBSD.

The original program uses ARM64 assembly. I’m a lot more comfortable with x86-64 assembly, plus that’s the hardware I have readily on hand. So the assembly language will be different, but all the concepts apply to both these architectures. Almost none of these OpenBSD system interfaces are formally documented (or stable for that matter), and I had to dig around the OpenBSD source tree to figure it out (along with a helpful jart nudge). So don’t be afraid to get your hands dirty.

There are lots of subtle problems in the original demo, so let’s go through the program piece by piece, starting with the entry point:

void
start()
{
        w("hello\n", 6);
        x();
}

This function is registered as the entry point in the ELF image, so it has no caller. That means no return address on the stack, so the stack is not aligned for a function. In toy programs that goes unnoticed, but compilers generate code assuming the stack is aligned. In a real application this is likely to crash deep on the first SIMD register spill.

We could fix this with a force_align_arg_pointer attribute, at least for architectures that support it, but I prefer to write the entry point in assembly. Especially so we can access the command line arguments and environment variables, which is necessary in a real application. That happens to work the same as it does on Linux, so here’s my old, familiar entry point:

asm (
    "        .globl _start\n"
    "_start: mov   %rsp, %rdi\n"
    "        call  start\n"
);

Per the ABI, the first argument passes through rdi, so I pass a copy of the stack pointer, rsp, as it appeared on entry. Entry point arguments argc, argv, and envp are all pushed on the stack at rsp, so the first real function can retrieve it all from just the stack pointer. The original demo won’t use it, though. Using call to pass control pushes a return address, which will never be used, and aligns the stack for the first real function. I name it _start because that’s what the linker expects and so things will go a little smoother, so it’s rather convenient that the original didn’t use this name.

Next up, the “write” function:

int
w(void *what, size_t len) {
        __asm(
"       mov x2, x1;"
"       mov x1, x0;"
"       mov w0, #1;"
"       mov x8, #4;"
"       svc #0;"
        );
        return 0;
}

There are two serious problems with this assembly block. First, the function arguments are not necessarily in those registers by the time control reaches the basic assembly block. The function prologue could move them around. Even more so if this function was inlined. This is exactly the problem extended inline assembly is intended to solve. Second, it clobbers a number of registers. Compilers assume this does not happen when generating their own code. This sort of assembly falls apart the moment it comes into contact with a non-zero optimization level.

Solving this is just a matter of using inline assembly properly:

long w(void *what, long len)
{
    char err;
    long rax = 4;  // SYS_write
    asm volatile (
        "syscall"
        : "+a"(rax), "+d"(len), "=@ccc"(err)
        : "D"(1), "S"(what)
        : "rcx", "r11", "memory"
    );
    return err ? -rax : rax;
}

I’ve enhanced it a bit, returning a Linux-style negative errno on error. In the BSD ecosystem, syscall errors are indicated using the carry flag, which here is output into err via =@ccc. When set, the return value is an errno. Further, the OpenBSD kernel uses both rax and rdx for return values, so I’ve also listed rdx as an input+output despite not consuming the result. Despite all these changes, this function is not yet complete! We’ll get back to it later.

The “exit” function, x, is just fine:

void
x() {
        __asm(
"       mov x8, #1;"
"       svc #0;"
        );
}

It doesn’t set an exit status, so it passes garbage instead, but otherwise this works. No inputs, plus clobbers and outputs don’t matter when control never returns. In a real application I might write it:

__attribute((noreturn))
void x(int status)
{
    asm volatile ("syscall" :: "a"(1), "D"(status));
    __builtin_unreachable();
}

This function will need a little additional work later, too.

The ident section is basically fine as-is:

__asm(" .section \".note.openbsd.ident\", \"a\"\n"
"       .p2align 2\n"
"       .long   8\n"
"       .long   4\n"
"       .long   1\n"
"       .ascii \"OpenBSD\\0\"\n"
"       .long   0\n"
"       .previous\n");

The compiler assumes the current section remains the same at the end of the assembly block, which here is accomplished with .previous. Though it clobbers the assembler’s remembered “other” section and so may interfere with surrounding code using .previous. Better to use .pushsection and .popsection for good stack discipline. There are many such examples in the OpenBSD source tree.

asm (
    ".pushsection .note.openbsd.ident, \"a\"\n"
    ".long  8, 4, 1, 0x6e65704f, 0x00445342, 0\n"
    ".popsection\n"
);

Now the trickiest part, the pinsyscall table:

struct whats {
        unsigned int offset;
        unsigned int sysno;
} happening[] __attribute__((section(".openbsd.syscalls"))) = {
        { 0x104f4, 4 },
        { 0x10530, 1 },
};

Those offsets — offsets from the beginning of the ELF image — were entered manually, and it kind of ruins the whole demo. We don’t have a good way to get at those offsets from C, or any high level language. However, we can solve that by tweaking the inline assembly with some labels:

__attribute((noinline))
long w(void *what, long len)
{
    // ...
    asm volatile (
        "_w: syscall"
        // ...
    );
    // ...
}

__attribute((noinline,noreturn))
void x(int status)
{
    asm volatile (
        "_x: syscall"
        // ...
    );
    // ...
}

Very importantly I’ve added noinline to prevent these functions from being inlined into additional copies of the syscall instruction, which of course won’t be registered. This also prevents duplicate labels causing assembler errors. Once we have the labels, we can use them in an assembly block listing the allowed syscall instructions:

asm (
    ".pushsection .openbsd.syscalls\n"
    ".long  _x, 1\n"
    ".long  _w, 4\n"
    ".popsection\n"
);

That lets the linker solve the offsets problem, which is its main job after all. With these changes the demo works reliably, even under high optimization levels. I suggest these flags:

$ cc -static -nostdlib -no-pie -o where where.c

Disabling PIE with -no-pie is necessary in real applications or else strings won’t work. You can apply more flags to strip it down further, but these are the flags generally necessary to compile these sorts of programs on at least OpenBSD 7.6.

So, how do I know this stuff works in general? Because I ported my ultra portable pkg-config clone, u-config, to use raw OpenBSD syscalls: openbsd_main.c. Everything still works at high optimization levels.

$ cc -static -nostartfiles -no-pie -o pkg-config openbsd_main.c libmemory.a
$ ./pkg-config --cflags --libs libcurl
-I/usr/local/include -L/usr/local/lib -lcurl

Because the new syscall wrappers behave just like Linux system calls, it leverages the linux_noarch.c platform, and the whole port is ~70 lines of code. A few more flags (-fno-stack-protector, -Oz, -s, etc.), and it squeezes into a slim 21.6K static binary.

Despite making no libc calls, it’s not possible stop compilers from fabricating (hallucinating?) string function calls, so the build above depends on external definitions. In the command above, libmemory.a comes from libmemory.c found in w64devkit. Alternatively, and on topic, you could link the OpenBSD libc string functions by omitting libmemory.a from the build.

$ cc -static -nostartfiles -no-pie -o pkg-config openbsd_main.c

Though it pulls in a lot of bloat (~8x size increase), and teasing out the necessary objects isn’t trivial.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)