Manual Control Flow Guard in C

January 21, 2017

nullprogram.com/blog/2017/01/21/

Recent versions of Windows have a new exploit mitigation feature called Control Flow Guard (CFG). Before an indirect function call — e.g. function pointers and virtual functions — the target address checked against a table of valid call addresses. If the address isn’t the entry point of a known function, then the program is aborted.

If an application has a buffer overflow vulnerability, an attacker may use it to overwrite a function pointer and, by the call through that pointer, control the execution flow of the program. This is one way to initiate a Return Oriented Programming (ROP) attack, where the attacker constructs a chain of gadget addresses — a gadget being a couple of instructions followed by a return instruction, all in the original program — using the indirect call as the starting point. The execution then flows from gadget to gadget so that the program does what the attacker wants it to do, all without the attacker supplying any code.

The two most widely practiced ROP attack mitigation techniques today are Address Space Layout Randomization (ASLR) and stack protectors. The former randomizes the base address of executable images (programs, shared libraries) so that process memory layout is unpredictable to the attacker. The addresses in the ROP attack chain depend on the run-time memory layout, so the attacker must also find and exploit an information leak to bypass ASLR.

For stack protectors, the compiler allocates a canary on the stack above other stack allocations and sets the canary to a per-thread random value. If a buffer overflows to overwrite the function return pointer, the canary value will also be overwritten. Before the function returns by the return pointer, it checks the canary. If the canary doesn’t match the known value, the program is aborted.

CFG works similarly — performing a check prior to passing control to the address in a pointer — except that instead of checking a canary, it checks the target address itself. This is a lot more sophisticated, and, unlike a stack canary, essentially requires coordination by the platform. The check must be informed on all valid call targets, whether from the main program or from shared libraries.

While not (yet?) widely deployed, a worthy mention is Clang’s SafeStack. Each thread gets two stacks: a “safe stack” for return pointers and other safely-accessed values, and an “unsafe stack” for buffers and such. Buffer overflows will corrupt other buffers but will not overwrite return pointers, limiting the effect of their damage.

An exploit example

Consider this trivial C program, demo.c:

int
main(void)
{
    char name[8];
    gets(name);
    printf("Hello, %s.\n", name);
    return 0;
}

It reads a name into a buffer and prints it back out with a greeting. While trivial, it’s far from innocent. That naive call to gets() doesn’t check the bounds of the buffer, introducing an exploitable buffer overflow. It’s so obvious that both the compiler and linker will yell about it.

For simplicity, suppose the program also contains a dangerous function.

void
self_destruct(void)
{
    puts("**** GO BOOM! ****");
}

The attacker can use the buffer overflow to call this dangerous function.

To make this attack simpler for the sake of the article, assume the program isn’t using ASLR (e.g. without -fpie/-pie, or with -fno-pie/-no-pie). For this particular example, I’ll also explicitly disable buffer overflow protections (e.g. _FORTIFY_SOURCE and stack protectors).

$ gcc -Os -fno-pie -D_FORTIFY_SOURCE=0 -fno-stack-protector \
      -o demo demo.c

First, find the address of self_destruct().

$ readelf -a demo | grep self_destruct
46: 00000000004005c5  10 FUNC  GLOBAL DEFAULT 13 self_destruct

This is on x86-64, so it’s a 64-bit address. The size of the name buffer is 8 bytes, and peeking at the assembly I see an extra 8 bytes allocated above, so there’s 16 bytes to fill, then 8 bytes to overwrite the return pointer with the address of self_destruct.

$ echo -ne 'xxxxxxxxyyyyyyyy\xc5\x05\x40\x00\x00\x00\x00\x00' > boom
$ ./demo < boom
Hello, xxxxxxxxyyyyyyyy?@.
**** GO BOOM! ****
Segmentation fault

With this input I’ve successfully exploited the buffer overflow to divert control to self_destruct(). When main tries to return into libc, it instead jumps to the dangerous function, and then crashes when that function tries to return — though, presumably, the system would have self-destructed already. Turning on the stack protector stops this exploit.

$ gcc -Os -fno-pie -D_FORTIFY_SOURCE=0 -fstack-protector \
      -o demo demo.c
$ ./demo < boom
Hello, xxxxxxxxaaaaaaaa?@.
*** stack smashing detected ***: ./demo terminated
======= Backtrace: =========
... lots of backtrace stuff ...

The stack protector successfully blocks the exploit. To get around this, I’d have to either guess the canary value or discover an information leak that reveals it.

The stack protector transformed the program into something that looks like the following:

int
main(void)
{
    long __canary = __get_thread_canary();
    char name[8];
    gets(name);
    printf("Hello, %s.\n", name);
    if (__canary != __get_thread_canary())
        abort();
    return 0;
}

However, it’s not actually possible to implement the stack protector within C. Buffer overflows are undefined behavior, and a canary is only affected by a buffer overflow, allowing the compiler to optimize it away.

Function pointers and virtual functions

After the attacker successfully self-destructed the last computer, upper management has mandated password checks before all self-destruction procedures. Here’s what it looks like now:

void
self_destruct(char *password)
{
    if (strcmp(password, "12345") == 0)
        puts("**** GO BOOM! ****");
}

The password is hardcoded, and it’s the kind of thing an idiot would have on his luggage, but assume it’s actually unknown to the attacker. Especially since, as I’ll show shortly, it won’t matter. Upper management has also mandated stack protectors, so assume that’s enabled from here on.

Additionally, the program has evolved a bit, and now uses a function pointer for polymorphism.

struct greeter {
    char name[8];
    void (*greet)(struct greeter *);
};

void
greet_hello(struct greeter *g)
{
    printf("Hello, %s.\n", g->name);
}

void
greet_aloha(struct greeter *g)
{
    printf("Aloha, %s.\n", g->name);
}

There’s now a greeter object and the function pointer makes its behavior polymorphic. Think of it as a hand-coded virtual function for C. Here’s the new (contrived) main:

int
main(void)
{
    struct greeter greeter = {.greet = greet_hello};
    gets(greeter.name);
    greeter.greet(&greeter);
    return 0;
}

(In a real program, something else provides greeter and picks its own function pointer for greet.)

Rather than overwriting the return pointer, the attacker has the opportunity to overwrite the function pointer on the struct. Let’s reconstruct the exploit like before.

$ readelf -a demo | grep self_destruct
54: 00000000004006a5  10 FUNC  GLOBAL DEFAULT  13 self_destruct

We don’t know the password, but we do know (from peeking at the disassembly) that the password check is 16 bytes. The attack should instead jump 16 bytes into the function, skipping over the check (0x4006a5 + 16 = 0x4006b5).

$ echo -ne 'xxxxxxxx\xb5\x06\x40\x00\x00\x00\x00\x00' > boom
$ ./demo < boom
**** GO BOOM! ****

Neither the stack protector nor the password were of any help. The stack protector only protects the return pointer, not the function pointer on the struct.

This is where the Control Flow Guard comes into play. With CFG enabled, the compiler inserts a check before calling the greet() function pointer. It must point to the beginning of a known function, otherwise it will abort just like the stack protector. Since the middle of self_destruct() isn’t the beginning of a function, it would abort if this exploit is attempted.

However, I’m on Linux and there’s no CFG on Linux (yet?). So I’ll implement it myself, with manual checks.

Function address bitmap

As described in the PDF linked at the top of this article, CFG on Windows is implemented using a bitmap. Each bit in the bitmap represents 8 bytes of memory. If those 8 bytes contains the beginning of a function, the bit will be set to one. Checking a pointer means checking its associated bit in the bitmap.

For my CFG, I’ve decided to keep the same 8-byte resolution: the bottom three bits of the target address will be dropped. The next 24 bits will be used to index into the bitmap. All other bits in the pointer will be ignored. A 24-bit bit index means the bitmap will only be 2MB.

These 24 bits is perfectly sufficient for 32-bit systems, but it means on 64-bit systems there may be false positives: some addresses will not represent the start of a function, but will have their bit set to 1. This is acceptable, especially because only functions known to be targets of indirect calls will be registered in the table, reducing the false positive rate.

Note: Relying on the bits of a pointer cast to an integer is unspecified and isn’t portable, but this implementation will work fine anywhere I would care to use it.

Here are the CFG parameters. I’ve made them macros so that they can easily be tuned at compile-time. The cfg_bits is the integer type backing the bitmap array. The CFG_RESOLUTION is the number of bits dropped, so “3” is a granularity of 8 bytes.

typedef unsigned long cfg_bits;
#define CFG_RESOLUTION  3
#define CFG_BITS        24

Given a function pointer f, this macro extracts the bitmap index.

#define CFG_INDEX(f) \
    (((uintptr_t)f >> CFG_RESOLUTION) & ((1UL << CFG_BITS) - 1))

The CFG bitmap is just an array of integers. Zero it to initialize.

struct cfg {
    cfg_bits bitmap[(1UL << CFG_BITS) / (sizeof(cfg_bits) * CHAR_BIT)];
};

Functions are manually registered in the bitmap using cfg_register().

void
cfg_register(struct cfg *cfg, void *f)
{
    unsigned long i = CFG_INDEX(f);
    size_t z = sizeof(cfg_bits) * CHAR_BIT;
    cfg->bitmap[i / z] |= 1UL << (i % z);
}

Because functions are registered at run-time, it’s fully compatible with ASLR. If ASLR is enabled, the bitmap will be a little different each run. On the same note, it may be worth XORing each bitmap element with a random, run-time value — along the same lines as the stack canary value — to make it harder for an attacker to manipulate the bitmap should he get the ability to overwrite it by a vulnerability. Alternatively the bitmap could be switched to read-only (e.g. mprotect()) once everything is registered.

And finally, the check function, used immediately before indirect calls. It ensures f was previously passed to cfg_register() (except for false positives, as discussed). Since it will be invoked often, it needs to be fast and simple.

void
cfg_check(struct cfg *cfg, void *f)
{
    unsigned long i = CFG_INDEX(f);
    size_t z = sizeof(cfg_bits) * CHAR_BIT;
    if (!((cfg->bitmap[i / z] >> (i % z)) & 1))
        abort();
}

And that’s it! Now augment main to make use of it:

struct cfg cfg;

int
main(void)
{
    cfg_register(&cfg, self_destruct);  // to prove this works
    cfg_register(&cfg, greet_hello);
    cfg_register(&cfg, greet_aloha);

    struct greeter greeter = {.greet = greet_hello};
    gets(greeter.name);
    cfg_check(&cfg, greeter.greet);
    greeter.greet(&greeter);
    return 0;
}

And now attempting the exploit:

$ ./demo < boom
Aborted

Normally self_destruct() wouldn’t be registered since it’s not a legitimate target of an indirect call, but the exploit still didn’t work because it called into the middle of self_destruct(), which isn’t a valid address in the bitmap. The check aborts the program before it can be exploited.

In a real application I would have a global cfg bitmap for the whole program, and define cfg_check() in a header as an inline function.

Despite being possible implement in straight C without the help of the toolchain, it would be far less cumbersome and error-prone to let the compiler and platform handle Control Flow Guard. That’s the right place to implement it.

Update: Ted Unangst pointed out OpenBSD performing a similar check in its mbuf library. Instead of a bitmap, the function pointer is replaced with an index into an array of registered function pointers. That approach is cleaner, more efficient, completely portable, and has no false positives.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.