Occasionally I’ve needed to key a hash table with C pointers. I don’t
care about the contents of the object itself — especially if it might
change — just its pointer identity. For example, suppose I’m using
null-terminated strings as keys and I know these strings will always
be interned in a common table. These strings can be compared directly
by their pointer values (str_a == str_b) rather than, more slowly,
by their contents (strcmp(str_a, str_b) == 0). The intern table
ensures that these expressions both have the same result.
As a key in a hash table, or other efficient map/dictionary data
structure, I’ll need to turn pointers into numerical values. However,
C pointers aren’t integers. Following certain rules it’s permitted
to cast pointers to integers and back, but doing so will reduce the
program’s portability. The most important consideration is that the
integer form isn’t guaranteed to have any meaningful or stable
value. In other words, even in a conforming implementation, the same
pointer might cast to two different integer values. This would break
any algorithm that isn’t keenly aware of the implementation details.
To show why this is, I’m going to be citing the relevant parts of the
C99 standard (ISO/IEC 9899:1999). The draft for C99 is freely
available (and what I use myself since I’m a cheapass). My purpose is
not to discourage you from casting pointers to integers and using
the result. The vast majority of the time this works fine and as you
would expect. I just think it’s an interesting part of the language,
and C/C++ programmers should be aware of potential the trade-offs.
Integer to pointer casts
What does the standard have to say about casting pointers to integers?
An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might
not be correctly aligned, might not point to an entity of the
referenced type, and might be a trap representation.
It also includes a footnote:
The mapping functions for converting a pointer to an integer or an
integer to a pointer are intended to be consistent with the
addressing structure of the execution environment.
Casting an integer to a pointer depends entirely on the
implementation. This is intended for things like memory mapped
hardware. The programmer may need to access memory as a specific
physical address, which would be encoded in the source as an integer
constant and cast to a pointer of the appropriate type.
It may also be used by a loader and dynamic linker to compute the
virtual address of various functions and variables, then cast to a
pointer before use.
Both cases are already dependent on implementation defined behavior,
so there’s nothing lost in relying on these casts.
An integer constant expression of 0 is a special case. It casts to a
NULL pointer in all implementations (§18.104.22.168¶3). However, a NULL
pointer doesn’t necessarily point to address zero, nor is it
necessarily a zero bit pattern (i.e. beware memset and calloc on
memory with pointers). It’s just guaranteed never to compare equally
with a valid object, and it is undefined behavior to dereference.
Pointer to integer casts
What about the other way around? §22.214.171.124¶6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
Like before, it’s implementation defined. However, the negatives are a
little stronger: the cast itself may be undefined behavior. I
speculate this is tied to integer overflow. The last part makes
pointer to integer casts optional for an implementation. This is one
way that the hash table above would be less portable.
When the cast is always possible, an implementation can provide an
integer type wide enough to hold any pointer value. §126.96.36.199¶1:
The following type designates a signed integer type with the
property that any valid pointer to void can be converted to this
type, then converted back to pointer to void, and the result will
compare equal to the original pointer:
The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this
type, then converted back to pointer to void, and the result will
compare equal to the original pointer:
These types are optional.
The take-away is that the integer has no meaningful value. The only
guarantee is that the integer can be cast back into a void pointer
that will compare equally. It would be perfectly legal for an
implementation to pass these assertions (and still sometimes fail).
Since the bits don’t have any particular meaning, arithmetic
operations involving them will also have no meaning. When a pointer
might map to two different integers, the hash values might not match
up, breaking hash tables that rely on them. Even with uintptr_t
provided, casting pointers to integers isn’t useful without also
relying on implementation defined properties of the result.
Reasons for this pointer insanity
What purpose could such strange pointer-to-integer casts serve?
A security-conscious implementation may choose to annotate pointers
with additional information by setting unused bits. It might be for
baggy bounds checks or, someday, in an undefined behavior
sanitizer. Before dereferencing annotated pointers, the
metadata bits would be checked for validity, and cleared/set before
use as an address. Or it may map the same object at multiple virtual
addresses) to avoid setting/clearing the metadata bits,
providing interoperability with code unaware of the annotations. When
pointers are compared, these bits would be ignored.
When these annotated pointers are cast to integers, the metadata bits
will be present, but a program using the integer wouldn’t know their
meaning without tying itself closely to that implementation.
Completely unused bits may even be filled with random garbage when
cast. It’s allowed.
You may have been thinking before about using a union or char * to
bypass the cast and access the raw pointer bytes, but you’d run into
the same problems on the same implementations.
The standard makes a distinction between strictly conforming
programs (§4¶5) and conforming programs (§4¶7). A strictly
conforming program must not produce output depending on implementation
defined behavior nor exceed minimum implementation limits. Very few
programs fit in this category, including any program using uintptr_t
since it’s optional. Here are more examples of code that isn’t
On the other hand, a conforming program is allowed to depend on
implementation defined behavior. Relying on meaningful, stable values
for pointers cast to uintptr_t/intptr_t is conforming even if your
program may exhibit bugs on some implementations.
For over a decade now, GNU Make has almost exclusively been my build
system of choice, either directly or indirectly. Unfortunately this
means I unnecessarily depend on some GNU extensions — an annoyance
when porting to the BSDs. In an effort to increase the portability of
my Makefiles, I recently read the POSIX make specification. I
learned two important things: 1) POSIX make is so barren it’s not
really worth striving for, and 2) make’s macro assignment mechanism
If you want to see it in action for yourself before reading further,
here’s a Makefile that implements Conway’s Game of Life (40x40) using
only macro assignments.
Run it with any make program in an ANSI terminal. It must literally
be named life.mak. Beware: if you run it longer than a few minutes,
your computer may begin thrashing.
make -f life.mak
It’s 100% POSIX-compatible except for the sleep 0.1 (fractional
sleep), which is only needed for visual effect.
A POSIX workaround
Unlike virtually every real world implementation, POSIX make doesn’t
support conditional parts. For example, you might want your Makefile’s
behavior to change depending on the value of certain variables. In GNU
Make it looks like this:
If the goal is to write a strictly POSIX Makefile, how could I work
around the lack of conditional parts and maintain a similar interface?
The selection of macro/variable to evaluate can be dynamically
selected, allowing for some useful tricks. First define the option’s
USE_FOO = 0
Then define both sets of flags:
EXTRA_FLAGS_0 = -Wbar
EXTRA_FLAGS_1 = -ffoo -lfoo
Now dynamically select one of these macros for assignment to
EXTRA_FLAGS = $(EXTRA_FLAGS_$(USE_FOO))
The assignment on the command line overrides the assignment in the
Makefile, so the user gets to override USE_FOO.
$ make # EXTRA_FLAGS = -Wbar
$ make USE_FOO=0 # EXTRA_FLAGS = -Wbar
$ make USE_FOO=1 # EXTRA_FLAGS = -ffoo -lfoo
Before reading the POSIX specification, I didn’t realize that the
left side of an assignment can get the same treatment. For example,
if I really want the “if defined” behavior back, I can use the macro
to mangle the left-hand side. For example,
The BSD flavors of make have a -V option for printing variables,
which is an easy way to retrieve output. I used an “immediate”
assignment (:=) for result since some versions of make won’t
evaluate the expression before -V printing.
$ make -f sqrt.mak -V result n=8
Without -V, a default target could be used instead:
There are no math operators, so performing arithmetic requires some
creativity. For example, integers could be represented as a
series of x characters. The number 4 is xxxx, the number 6 is
xxxxxx, etc. Addition is concatenation (note: macros can have + in
A = xxx
B = xxxx
A+B = $(A)$(B)
However, since there’s no way to “slice” a value, subtraction isn’t
possible. A more realistic approach to arithmetic would require lookup
Branching could be achieved through more lookup tables. For example,
$ make n=5 op=sqrt # 2.236068
$ make n=5 op=square # 25
Or using the DEBUG trick above, use the condition to mask out the
results of the unwanted branch. This is similar to the mov paper.
result := $(op)($(n)) = $($(op)_$(n))
result$(verbose) := $($(op)_$(n))
And its usage:
$ make n=5 op=square # 25
$ make n=5 op=square verbose=1 # square(5) = 25
What about loops?
Looping is a tricky problem. However, one of the most common build
(anti?)patterns is the recursive Makefile. Borrowing from the
mov paper, which used an unconditional jump to restart the program
from the beginning, for a Makefile Turing-completeness I can invoke
the Makefile recursively, restarting the program with a new set of
Remember the print target above? I can loop by invoking make again
with new inputs in this target,
Before going any further, now that loops have been added, the natural
next question is halting. In reality, the operating system will take
care of that after some millions of make processes have carelessly
been invoked by this horribly inefficient scheme. However, we can do
better. The program can clobber the MAKE variable when it’s ready to
halt. Let’s formalize it.
Suppose we want to count down to 0. There will be an initial count:
count = 6
A decrement table:
6 = 5
5 = 4
4 = 3
3 = 2
2 = 1
1 = 0
0 = loop
The last line will be used to halt by clearing the name on the right
side. This is three star territory.
The result (current iteration) loop value is computed from the lookup
result = $($(count))
The next loop value is passed via args. If loop was cleared above,
this result will be discarded.
args = count=$(result)
With all that in place, invoking the Makefile will print a countdown
from 5 to 0 and quit. This is the general structure for the Game of
Life macro program.
Game of Life
A universal Turing machine has been implemented in Conway’s Game of
Life. With all that heavy lifting done, one of the easiest
methods today to prove a language’s Turing-completeness is to
implement Conway’s Game of Life. Ignoring the criminal inefficiency of
it, the Game of Life Turing machine could be run on the Game of Life
simulation running on make’s macro assignments.
In the Game of Life program — the one linked at the top of this
article — each cell is stored in a macro named xxyy, after its
position. The top-left most cell is named 0000, then going left to
right, 0100, 0200, etc. Providing input is a matter of assigning each
of these macros. I chose X for alive and - for dead, but, as
you’ll see, any two characters permitted in macro names would work as
$ make 0000=X 0100=- 0200=- 0300=X ...
The next part should be no surprise: The rules of the Game of Life are
encoded as a 512-entry lookup table. The key is formed by
concatenating the cell’s value along with all its neighbors, with
itself in the center.
Modern operating systems run processes within virtual memory using a
piece of hardware called a memory management unit (MMU). The MMU
contains a page table that defines how virtual memory maps onto
physical memory. The operating system is responsible for maintaining
this page table, mapping and unmapping virtual memory to physical
memory as needed by the processes it’s running. If a process accesses
a page that is not currently mapped, it will trigger a page fault
and the execution of the offending thread will be paused until the
operating system maps that page.
This functionality allows for a neat hack: A physical memory address
can be mapped to multiple virtual memory addresses at the same time. A
process running with such a mapping will see these regions of memory
as aliased — views of the same physical memory. A store to one of
these addresses will simultaneously appear across all of them.
Some useful applications of this feature include:
An extremely fast, large memory “copy” by mapping the source memory
overtop the destination memory.
Trivial interoperability between code instrumented with baggy
bounds checking [PDF] and non-instrumented code. A few bits
of each pointer are reserved to tag the pointer with the size of its
memory allocation. For compactness, the stored size is rounded up to
a power of two, making it “baggy.” Instrumented code checks this tag
before making a possibly-unsafe dereference. Normally, instrumented
code would need to clear (or set) these bits before dereferencing or
before passing it to non-instrumented code. Instead, the allocation
could be mapped simultaneously at each location for every possible
tag, making the pointer valid no matter its tag bits.
Two responses to my last post on hotpatching suggested
that, instead of modifying the instruction directly, memory
containing the modification could be mapped over top of the code. I
would copy the code to another place in memory, safely modify it in
private, switch the page protections from write to execute (both for
W^X and for other hardware limitations), then map it over
the target. Restoring the original behavior would be as simple as
unmapping the change.
Both POSIX and Win32 allow user space applications to create these
aliased mappings. The original purpose for these APIs is for shared
memory between processes, where the same physical memory is mapped
into two different processes’ virtual memory. But the OS doesn’t stop
us from mapping the shared memory to a different address within the
POSIX Memory Mapping
On POSIX systems (Linux, *BSD, OS X, etc.), the three key functions
are shm_open(3), ftruncate(2), and mmap(2).
First, create a file descriptor to shared memory using shm_open. It
has very similar semantics to open(2).
The name works much like a filesystem path, but is actually a
different namespace (though on Linux it is a tmpfs mounted at
/dev/shm). Resources created here (O_CREAT) will persist until
explicitly deleted (shm_unlink(3)) or until the system reboots. It’s
an oversight in POSIX that a name is required even if we never intend
to access it by name. File descriptors can be shared with other
processes via fork(2) or through UNIX domain sockets, so a name
isn’t strictly required.
OpenBSD introduced shm_mkstemp(3) to solve this problem,
but it’s not widely available. On Linux, as of this writing, the
O_TMPFILE flag may or may not provide a fix (it’s
The portable workaround is to attempt to choose a unique name, open
the file with O_CREAT | O_EXCL (either atomically create the file or
fail), shm_unlink the shared memory object as soon as possible, then
cross our fingers. The shared memory object will still exist (the file
descriptor keeps it alive) but will not longer be accessible by name.
The shared memory object is brand new (O_EXCL) and is therefore of
zero size. ftruncate sets it to the desired size. This does not
need to be a multiple of the page size. Failing to allocate memory
will result in a bus error on access.
Finally mmap the shared memory into place just as if it were a file.
We can choose an address (aligned to a page) or let the operating
system choose one for use (NULL). If we don’t plan on making any more
mappings, we can also close the file descriptor. The shared memory
object will be freed as soon as it completely unmapped (munmap(2)).
At this point both a and b have different addresses but point (via
the page table) to the same physical memory. Changes to one are
reflected in the other. So this:
*a=0xdeafbeef;printf("%p %p 0x%x\n",a,b,*b);
Will print out something like:
0x6ffffff0000 0x6fffffe0000 0xdeafbeef
It’s also possible to do all this only with open(2) and mmap(2) by
mapping the same file twice, but you’d need to worry about where to
put the file, where it’s going to be backed, and the operating system
will have certain obligations about syncing it to storage somewhere.
Using POSIX shared memory is simpler and faster.
Windows Memory Mapping
Windows is very similar, but directly supports anonymous shared
memory. The key functions are CreateFileMapping, and
First create a file mapping object from an invalid handle value. Like
POSIX, the word “file” is used without actually involving files.
In this post I’m going to do a silly, but interesting, exercise that
should never be done in any program that actually matters. I’m going
write a program that changes one of its function definitions while
it’s actively running and using that function. Unlike last
time, this won’t involve shared libraries, but it will require
x86_64 and GCC. Most of the time it will work with Clang, too, but
it’s missing an important compiler option that makes it stable.
If you want to see it all up front, here’s the full source:
Here’s the function that I’m going to change:
It’s dead simple, but that’s just for demonstration purposes. This
will work with any function of arbitrary complexity. The definition
will be changed to this:
I was only going change the string, but I figured I should make it a
little more interesting.
Here’s how it’s going to work: I’m going to overwrite the beginning of
the function with an unconditional jump that immediately moves control
to the new definition of the function. It’s vital that the function
prototype does not change, since that would be a far more complex
But first there’s some preparation to be done. The target needs to
be augmented with some GCC function attributes to prepare it for its
redefinition. As is, there are three possible problems that need to be
I want to hotpatch this function while it is being used by another
thread without any synchronization. It may even be executing the
function at the same time I clobber its first instructions with my
jump. If it’s in between these instructions, disaster will strike.
The solution is the ms_hook_prologue function attribute. This tells
GCC to put a hotpatch prologue on the function: a big, fat, 8-byte NOP
that I can safely clobber. This idea originated in Microsoft’s Win32
API, hence the “ms” in the name.
The prologue NOP needs to be updated atomically. I can’t let the
other thread see a half-written instruction or, again, disaster. On
x86 this means I have an alignment requirement. Since I’m
overwriting an 8-byte instruction, I’m specifically going to need
8-byte alignment to get an atomic write.
The solution is the aligned function attribute, ensuring the
hotpatch prologue is properly aligned.
The final problem is that there must be exactly one copy of this
function in the compiled program. It must never be inlined or
cloned, since these won’t be hotpatched.
As you might have guessed, this is primarily fixed with the noinline
function attribute. Since GCC may also clone the function and call
that instead, so it also needs the noclone attribute.
Even further, if GCC determines there are no side effects, it may
cache the return value and only ever call the function once. To
convince GCC that there’s a side effect, I added an empty inline
assembly string (__asm("")). Since puts() has a side effect
(output), this isn’t truly necessary for this particular example, but
I’m being thorough.
It’s 8-byte aligned and it has the 8-byte NOP: that lea instruction
does nothing. It copies rsp into itself and changes no flags. Why
not 8 1-byte NOPs? I need to replace exactly one instruction with
exactly one other instruction. I can’t have another thread in between
Next, let’s take a look at the function that will perform the
hotpatch. I’ve written a generic patching function for this purpose.
This part is entirely specific to x86.
It takes the address of the function to be patched and the address of
the function to replace it. As mentioned, the target must be 8-byte
aligned (enforced by the assert). It’s also important this function is
only called by one thread at a time, even on different targets. If
that was a concern, I’d wrap it in a mutex to create a critical
There are a number of things going on here, so let’s go through them
one at a time:
Make the function writeable
The .text segment will not be writeable by default. This is for both
security and safety. Before I can hotpatch the function I need to make
the function writeable. To make the function writeable, I need to make
its page writable. To make its page writeable I need to call
mprotect(). If there was another thread monkeying with the page
attributes of this page at the same time (another thread calling
hotpatch()) I’d be in trouble.
It finds the page by rounding the target address down to the nearest
4096, the assumed page size (sorry hagepages). Warning: I’m being a
bad programmer and not checking the result of mprotect(). If it
fails, the program will crash and burn. It will always fail systems
with W^X enforcement, which will likely become the standard in the
future. Under W^X (“write XOR execute”), memory can either
be writeable or executable, but never both at the same time.
What if the function straddles pages? Well, I’m only patching the
first 8 bytes, which, thanks to alignment, will sit entirely inside
the page I just found. It’s not an issue.
At the end of the function, I mprotect() the page back to
Create the instruction
I’m assuming the replacement function is within 2GB of the original in
virtual memory, so I’ll use a 32-bit relative jmp instruction. There’s
no 64-bit relative jump, and I only have 8 bytes to work within
anyway. Looking that up in the Intel manual, I see this:
Fortunately it’s a really simple instruction. It’s opcode 0xE9 and
it’s followed immediately by the 32-bit displacement. The instruction
is 5 bytes wide.
To compute the relative jump, I take the difference between the
functions, minus 5. Why the 5? The jump address is computed from the
position after the jump instruction and, as I said, it’s 5 bytes
I put 0xE9 in a byte array, followed by the little endian
displacement. The astute may notice that the displacement is signed
(it can go “up” or “down”) and I used an unsigned integer. That’s
because it will overflow nicely to the right value and make those
Finally, the instruction byte array I just computed is written over
the hotpatch NOP as a single, atomic, 64-bit store.
*(uint64_t *)target = instruction.value;
Other threads will see either the NOP or the jump, nothing in between.
There’s no synchronization, so other threads may continue to execute
the NOP for a brief moment even through I’ve clobbered it, but that’s
I fire off the other thread to keep it pinging at hello(). In the
main thread, it waits until I hit enter to give the program input,
after which it calls hotpatch() and changes the function called by
the “worker” thread. I’ve now changed the behavior of the worker
thread without its knowledge. In a more practical situation, this
could be used to update parts of a running program without restarting
or even synchronizing.
These related articles have been shared with me since publishing this
When developing minimal, freestanding Windows programs, it’s
obviously beneficial to take full advantage of dynamic libraries that
are already linked rather than duplicate that functionality in the
application itself. Every Windows process automatically, and
involuntarily, has kernel32.dll and ntdll.dll loaded into its process
space before it starts. As discussed previously, kernel32.dll provides
the Windows API (Win32). The other, ntdll.dll, provides the Native
API for user space applications, and is the focus of this article.
The Native API is a low-level API, a foundation for the implementation
of the Windows API and various components that don’t use the Windows
API (drivers, etc.). It includes a runtime library (RTL) suitable for
replacing important parts of the C standard library, unavailable to
freestanding programs. Very useful for a minimal program.
Unfortunately, using the Native API is a bit of a minefield. Not all
of the documented Native API functions are actually exported by
ntdll.dll, making them inaccessible both for linking and
GetProcAddress(). Some are exported, but not documented as such.
Others are documented as exported but are not documented when (which
release of Windows). If a particular function wasn’t exported until
Windows 8, I don’t want to use when supporting Windows 7.
This is further complicated by the Microsoft Windows SDK, where many
of these functions are just macros that alias C runtime functions.
Naturally, MinGW closely follows suit. For example, in both cases,
here is how the Native API function RtlCopyMemory is “declared.”
This is certainly not useful for freestanding programs, though it has
a significant benefit for hosted programs: The C compiler knows the
semantics of memcpy() and can properly optimize around it. Any C
compiler worth its salt will replace a small or aligned, fixed-sized
memcpy() or memmove() with the equivalent inlined code. For
On x86_64 (GCC 4.9.3, -Os), this memmove() call is replaced with
two instructions. This isn’t possible when calling an opaque function
in a non-standard dynamic library. The side effects could be anything.
movaps xmm0, [rsp + 48]
movaps [rsp + 32], xmm0
These Native API macro aliases are what have allowed certain Wine
issues to slip by unnoticed for years. Very few user space
applications actually call Native API functions, even when addressed
directly by name in the source. The development suite is pulling a
bait and switch.
Like last time I danced at the edge of the compiler, this has
caused headaches in my recent experimentation with freestanding
executables. The MinGW headers assume that the programs including them
will link against a C runtime. Dirty hack warning: To work around it,
I have to undo the definition in the MinGW headers and make my own.
For example, to use the real RtlMoveMemory():
As of this writing, the same approach is not reliable with
RtlCopyMemory(), the cousin to memcpy(). As far as I can tell, it
was only exported starting in Windows 7 SP1 and Wine 1.7.46 (June
2015). Use RtlMoveMemory() instead. The overlap-handling overhead is
negligible compared to the function call overhead anyway.
As a side note: one reason besides minimalism for not implementing
your own memmove() is that it’s not legal to implement efficiently
in straight C. According to the language specification, your
implementation of memmove() would not be permitted to compare its
pointer arguments with <, >, <=, or >=. That would lead to
undefined behavior if they pointed to unrelated objects (ISO/IEC
9899:2011 §6.5.8¶5). So the only legal approach is to allocate a
temporary buffer, copy the source buffer into it, then copy it into
the destination buffer.
I keep mentioning Wine because I’ve been careful to ensure my
applications run correctly with it. So far it’s worked perfectly
with both Windows API and Native API functions. Thanks to the hard
work behind the Wine project, despite being written sharply against
the Windows API, these tiny programs remain relatively portable (x86
and ARM). It’s a good fit for graphical applications (games), but I
would never write a command line application like this. The command
line has always been a second class citizen on Windows.
Mostly for my own future reference, here are export lists for two
different versions of kernel32.dll and ntdll.dll:
As I collect more of these export lists, I’ll be able to paint a full
picture of when particular functions first appeared as exports. These
lists were generated with objdump -p <path_to_dll>.
Now that I’ve got these Native API issues sorted out, I’ve
significantly expanded the capabilities of my tiny, freestanding
programs without adding anything to their size. Functions like
RtlUnicodeToUTF8N() and RtlUTF8ToUnicodeN() will surely be handy.
Recently I’ve been experimenting with freestanding C programs on
Windows. Freestanding refers to programs that don’t link, either
statically or dynamically, against a standard library (i.e. libc).
This is typical for operating systems and similar, bare metal
situations. Normally a C compiler can make assumptions about the
semantics of functions provided by the C standard library. For
example, the compiler will likely replace a call to a small,
fixed-size memmove() with move instructions. Since a freestanding
program would supply its own, it may have different semantics.
My usual go to for C/C++ on Windows is Mingw-w64, which has
greatly suited my needs the past couple of years. It’s packaged on
Debian, and, when combined with Wine, allows me to fully develop
Windows applications on Linux. Being GCC, it’s also great for
cross-platform development since it’s essentially the same compiler as
the other platforms. The primary difference is the interface to the
operating system (POSIX vs. Win32).
However, it has one glaring flaw inherited from MinGW: it links
against msvcrt.dll, an ancient version of the Microsoft C runtime
library that currently ships with Windows. Besides being dated and
quirky, it’s not an official part of Windows and never has
been, despite its inclusion with every release since Windows 95.
Mingw-w64 doesn’t have a C library of its own, instead patching over
some of the flaws of msvcrt.dll and linking against it.
Since so much depends on msvcrt.dll despite its unofficial nature,
it’s unlikely Microsoft will ever drop it from future releases of
Windows. However, if strict correctness is a concern, we must ask
Mingw-w64 not to link against it. An alternative would be
PlibC, though the LGPL licensing is unfortunate. Another is
Cygwin, which is a very complete POSIX environment, but is heavy and
Sometimes I’d prefer to be more direct: skip the C library altogether
and talk directly to the operating system. On Windows that’s the Win32
API. Ultimately I want a tiny, standalone .exe that only links against
Linux vs. Windows
The most important benefit of a standard library like libc is a
portable, uniform interface to the host system. So long as the
standard library suits its needs, the same program can run anywhere.
Without it, the programs needs an implementation of each
On Linux, operating system requests at the lowest level are made
directly via system calls. This requires a bit of assembly language
for each supported architecture (int 0x80 on x86, syscall on
x86_64, swi on ARM, etc.). The POSIX functions of the various Linux
libc implementations are built on top of this mechanism.
For example, here’s a function for a 1-argument system call on x86_64.
The situation is simpler on Windows. Its low level system calls are
undocumented and unstable, changing across even minor updates. The
formal, stable interface is through the exported functions in
kernel32.dll. In fact, kernel32.dll is essentially a standard library
on its own (making the term “freestanding” in this case dubious). It
includes functions usually found only in user-space, like string
manipulation, formatted output, font handling, and heap management
(similar to malloc()). It’s not POSIX, but it has analogs to much of
the same functionality.
The standard entry for a C program is main(). However, this is not
the application’s true entry. The entry is in the C library, which
does some initialization before calling your main(). When main()
returns, it performs cleanup and exits. Without a C library, programs
don’t start at main().
On Linux the default entry is the symbol _start. It’s prototype
would look like so:
Returning from this function leads to a segmentation fault, so it’s up
to your application to perform the exit system call rather than
On Windows, the entry depends on the type of application. The two
relevant subsystems today are the console and windows subsystems.
The former is for console applications (duh). These programs may still
create windows and such, but must always have a controlling console.
The latter is primarily for programs that don’t run in a console,
though they can still create an associated console if they like. In
Mingw-w64, give -mconsole (default) or -mwindows to the linker to
choose the subsystem.
Unlike Linux’s _start, Windows programs can safely return from these
functions, similar to main(), hence the int return. The WINAPI
macro means the function may have a special calling convention,
depending on the platform.
On any system, you can choose a different entry symbol or address
using the --entry option to the GNU linker.
One problem I’ve run into is Mingw-w64 generating code that calls
__chkstk_ms() from libgcc. I believe this is a long-standing bug,
since -ffreestanding should prevent these sorts of helper functions
from being used. The workaround I’ve found is to disable stack
Notice I manually linked against kernel32.dll. The stripped final
result is only 4kB, mostly PE padding. There are techniques to trim
this down even further, but for a substantial program it
wouldn’t make a significant difference.
From here you could create a GUI by linking against user32.dll and
gdi32.dll (both also part of Win32) and calling the appropriate
functions. I already ported my OpenGL demo to a freestanding
.exe, dropping GLFW and directly using Win32 and WGL. It’s much less
portable, but the final .exe is only 4kB, down from the original 104kB
(static linking against GLFW).
It’s been two years since I last wrote about Elfeed, my
Atom/RSS feed reader for Emacs. I’ve used it every single
day since, and I continue to maintain it with help from the community.
So far 18 people besides me have contributed commits. Over the last
couple of years it’s accumulated some new features, some more obvious
Every time I mark a new release, I update the ChangeLog at the top of
elfeed.el which lists what’s new. Since it’s easy to overlook many of
the newer useful features, I thought I’d list the more important ones
Custom Entry Colors
You can now customize entry faces through elfeed-search-face-alist.
This variable maps tags to faces. An entry inherits the face of any
tag it carries. Previously “unread” was a special tag that got a bold
face, but this is now implemented as nothing more than an initial
entry in the alist.
I’ve been using it to mark different kinds of content (videos,
podcasts, comics) with different colors.
You can specify the starting tags for entries from particular feeds
directly in the feed listing. This has been a feature for awhile now,
but it’s not something you’d want to miss. It started out as a feature
in my personal configuration that eventually migrated into Elfeed
For example, your elfeed-feeds may initially look like this,
especially if you imported from OPML.
If you wanted certain tags applied to entries from each, you would
need to putz around with elfeed-make-tagger. For the most common
case — apply certain tags to all entries from a URL — it’s much
simpler to specify the information as part of the listing itself,
Today I only use custom tagger functions in my own configuration to
filter within a couple of particularly noisy feeds.
Metadata is more for Elfeed extensions (i.e. elfeed-org)
than regular users. You can attach arbitrary, readable
metadata to any Elfeed object (entry, feed). This metadata is
automatically stored in the database. It’s a plist.
Metadata is accessed entirely through one setf-able function:
elfeed-meta. For example, you might want to track when you’ve read
something, not just that you’ve read it. You could use this to
selectively update certain feeds or just to evaluate your own habits.
Two things motivated this feature. First, without a plist, if I added
more properties in the future, I would need to change the database
format to support them. I modified the database format to add
metadata, requiring an upgrade function to quietly upgrade older
databases as they were loaded. I’d really like to avoid this in the
Second, I wanted to make it easy for extension authors to store their
own data. I still imagine an extension someday to update feeds
intelligently based on their history. For example, the database
doesn’t track when the feed was last fetched, just the date of the
most recent entry (if any). A smart-update extension could use
metadata to tag feeds with this information.
Elfeed itself already uses two metadata keys: :failures on feeds and
:title on both. :failures counts the total number of times
fetching that feed resulted in an error. You could use this get a
listing of troublesome feeds like so,
The :title property allows for a custom title for both feeds and
entries in the search buffer listing, assuming you’re using the
default function (see below). It overrides the title provided by the
feed itself. This is different than elfeed-entry-title and
elfeed-feed-title, which is kept in sync with feed content. Metadata
is not kept in sync with the feed itself.
You can invert filter components by prefixing them with !. For
example, say you’re looking at all my posts from the past 6 months:
But say you’re tired of me and decide you want to see every entry from
the past 6 months excluding my posts.
Normally you limit the number of results by date, but you can now
limit the result by count using #n. For example, to see my most
recent 12 posts regardless of date,
This is used internally in the live filter to limit the number of
results to the height of the screen. If you noticed that live
filtering has been much more responsive in the last few months, this is
Elfeed properly integrates with Emacs’ bookmarks (thanks to
groks). You can bookmark the current filter with M-x
bookmark-set (C-x r m). By default, Emacs will persist bookmarks
between sessions. To revisit a filter in the future, M-x
bookmark-jump (C-x r b).
Since this requires no configuration, this may serve as an easy
replacement for manually building “view” toggles — filters bound to
certain keys — which I know many users have done, including me.
If you’ve updated very recently, you probably noticed Elfeed got a
brand new header. Previously it faked a header by writing to the first
line of the buffer. This is because somehow I had no idea Emacs had
official support for buffer headers (despite notmuch using them all
The new header includes additional information, such as the current
filter, the number of unread entries, the total number of entries, and
the number of unique feeds currently in view. You’ll see this as
<unread>/<total>:<feeds> in the middle of the header.
As of this writing, the new header has not been made part of a formal
release. So if you’re only tracking stable releases, you won’t see
this for awhile longer.
As you already know, in the search buffer listing you can press G to
update your feeds. But did you know you it takes a prefix argument?
Run as C-u G, it only updates feeds with entries currently listed in
As of this writing, this is another feature not yet in a formal
release. I’d been wanting something like this for awhile but couldn’t
think of a reasonable interface. Directly prompting the user for feeds
is neither elegant nor composable. However, groks suggested the
prefix argument, which composes perfectly with Elfeed’s
In addition to custom faces, there are a number of ways to customize
Choose the sort order with elfeed-sort-order.
Set a custom date format with elfeed-search-date-format.
Adjust field widths with elfeed-search-*-width.
Or override everything with elfeed-search-print-entry-function.
Gergely Nagy has been throwing lots of commits at me over the last
couple of weeks to open up lots of Elfeed’s behavior to customization,
so there are more to come.
Thank You, Emacs Community
Apologies about any features I missed or anyone I forgot to mention
who’s made contributions. The above comes from my ChangeLogs, the
commit log, the GitHub issue listing, and my own memory, so I’m likely
to have forgotten some things. A couple of these features I had
forgotten about myself!
To use it, you’ll need Poppler’s pdftotext command line
program — used to build an index of the PDF — and a copy of the
complete Volume 2 of Intel’s instruction set manual. There’s only one
command to worry about: M-x x86-lookup.
Minimize documentation friction
This package should be familiar to anyone who’s used
javadoc-lookup, one of my older packages. It has a common
underlying itch: the context switch to read API documentation while
coding should have as little friction as possible, otherwise I’m
discouraged from doing it. In an ideal world I wouldn’t ever need to
check documentation because it’s already in my head. By visiting
documentation frequently with ease, it’s going to become familiar that
much faster and I’ll be reaching for it less and less, approaching the
I picked up x86 assembly [about a year ago][x86] and for the first few
months I struggled to find a good online reference for the instruction
set. There are little scraps here and there, but not much of
substance. The big exception is Félix Cloutier’s reference,
which is an amazingly well-done HTML conversion of Intel’s PDF
manuals. Unfortunately I could never get it working locally to
generate my own. There’s also the X86 Opcode and Instruction
Reference, but it’s more for machines than humans.
Besides, I often work without an Internet connection, so offline
documentation is absolutely essential. (You hear that Microsoft? Not
only do I avoid coding against Win32 because it’s badly designed, but
even more so because you don’t offer offline documentation anymore!
The friction to API reference your documentation is enormous.)
I avoided the official x86 documentation for awhile, thinking it would
be too opaque, at least until I became more accustomed to the
instruction set. But really, it’s not bad! With a handle on the
basics, I would encourage anyone to dive into either Intel’s or AMD’s
manuals. The reason there’s not much online in HTML form is
because these manuals are nearly everything you need.
I chose Intel’s manuals for x86-lookup because I’m more familiar with
it, it’s more popular, it’s (slightly) easier to parse, it’s offered
as a single PDF, and it’s more complete. The regular expression for
finding instructions is tuned for Intel’s manual and it won’t work
with AMD’s manuals.
For a couple months prior to writing x86-lookup, I had a couple of
scratch functions to very roughly accomplish the same thing. The
tipping point for formalizing it was that last month I wrote my own
x86 assembler. A single mnemonic often has a dozen or more different
opcodes depending on the instruction’s operands, and there are often
several ways to encode the same operation. I was frequently looking up
opcodes, and navigating the PDF quickly became a real chore. I only
needed about 80 different opcodes, so I was just adding them to the
assembler’s internal table manually as needed.
How does it work?
Say you want to look up the instruction RDRAND.
Initially Emacs has no idea what page this is on, so the first step is
to build an index mapping mnemonics to pages. x86-lookup runs the
pdftotext command line program on the PDF and loads the result into a
The killer feature of pdftotext is that it emits FORM FEED (U+0012)
characters between pages. Think of these as page breaks. By counting
form feed characters, x86-lookup can track the page for any part of
the document. In fact, Emacs is already set up to do this with its
forward-page and backward-page commands. So to build the index,
x86-lookup steps forward page-by-page looking for mnemonics, keeping
note of the page. Since this process typically takes about 10 seconds,
the index is cached in a file (see x86-lookup-cache-directory) for
future use. It only needs to happen once for a particular manual on a
The mnemonic listing is slightly incomplete, so x86-lookup expands
certain mnemonics into the familiar set. For example, all the
conditional jumps are listed under “Jcc,” but this is probably not
what you’d expect to look up. I compared x86-lookup’s mnemonic listing
against NASM/nasm-mode’s mnemonics to ensure everything was accounted
for. Both packages benefited from this process.
Once the index is built, pdftotext is no longer needed. If you’re
desperate and don’t have this program available, you can borrow the
index file from another computer. But you’re on your own for figuring
So to look up RDRAND, x86-lookup checks the index for the page number
and invokes a PDF reader on that page. This is where not all PDF
readers are created equal. There’s no convention for opening a PDF to
a particular page and each PDF reader differs. Some don’t even support
it. To deal with this, x86-lookup has a function specialized for
different PDF readers. Similar to browse-url-browser-function,
x86-lookup has x86-lookup-browse-pdf-function.
By default it tries to open the PDF for viewing within Emacs (did you
know Emacs is a PDF viewer?), falling back to on options if the
feature is unavailable. I welcome pull requests for any PDF readers
not yet supported by x86-lookup. Perhaps this functionality deserves
its own package.
That’s it! It’s a simple feature that has already saved me a lot of
time. If you’re ever programming in x86 assembly, give x86-lookup a