nullprogram.com/blog/2016/10/27/
In a C program, suppose I have a table of color names of similar
length. There are two straightforward ways to construct this table.
The most common would be an array of char *
.
char *colors_ptr[] = {
"red",
"orange",
"yellow",
"green",
"blue",
"violet"
};
The other is a two-dimensional char
array.
char colors_2d[][7] = {
"red",
"orange",
"yellow",
"green",
"blue",
"violet"
};
The initializers are identical, and the syntax by which these tables
are used is the same, but the underlying data structures are very
different. For example, suppose I had a lookup() function that
searches the table for a particular color.
int
lookup(const char *color)
{
int ncolors = sizeof(colors) / sizeof(colors[0]);
for (int i = 0; i < ncolors; i++)
if (strcmp(colors[i], color) == 0)
return i;
return -1;
}
Thanks to array decay — array arguments are implicitly converted to
pointers (§6.9.1-10) — it doesn’t matter if the table is char
colors[][7]
or char *colors[]
. It’s a little bit misleading because
the compiler generates different code depending on the type.
Memory Layout
Here’s what colors_ptr
, a jagged array, typically looks like in
memory.
The array of six pointers will point into the program’s string table,
usually stored in a separate page. The strings aren’t in any
particular order and will be interspersed with the program’s other
string constants. The type of the expression colors_ptr[n]
is char *
.
On x86-64, suppose the base of the table is in rax
, the index of the
string I want to retrieve is rcx
, and I want to put the string’s
address back into rax
. It’s one load instruction.
Contrast this with colors_2d
: six 7-byte elements in a row. No
pointers or addresses. Only strings.
The strings are in their defined order, packed together. The type of
the expression colors_2d[n]
is char [7]
, an array rather than a
pointer. If this was a large table used by a hot function, it would
have friendlier cache characteristics — both in locality and
predictability.
In the same scenario before with x86-64, it takes two instructions to
put the string’s address in rax
, but neither is a load.
imul rcx, rcx, 7
add rax, rcx
In this particular case, the generated code can be slightly improved
by increasing the string size to 8 (e.g. char colors_2d[][8]
). The
multiply turns into a simple shift and the ALU no longer needs to be
involved, cutting it to one instruction. This looks like a load due to
the LEA (Load Effective Address), but it’s not.
Relocation
There’s another factor to consider: relocation. Nearly every process
running on a modern system takes advantage of a security feature
called Address Space Layout Randomization (ASLR). The virtual address
of code and data is randomized at process load time. For shared
libraries, it’s not just a security feature, it’s essential to their
basic operation. Libraries cannot possibly coordinate their preferred
load addresses with every other library on the system, and so must be
relocatable.
If the program is compiled with GCC or Clang configured for position
independent code — -fPIC
(for libraries) or -fpie
+ -pie
(for
programs) — extra work has to be done to support colors_ptr
. Those
are all addresses in the pointer array, but the compiler doesn’t know
what those addresses will be. The compiler fills the elements with
temporary values and adds six relocation entries to the binary, one
for each element. The loader will fill out the array at load time.
However, colors_2d
doesn’t have any addresses other than the address
of the table itself. The loader doesn’t need to be involved with each
of its elements. Score another point for the two-dimensional array.
On x86-64, in both cases the table itself typically doesn’t need a
relocation entry because it will be RIP-relative (in the small code
model). That is, code that uses the table will be at a fixed
offset from the table no matter where the program is loaded. It won’t
need to be looked up using the Global Offset Table (GOT).
In case you’re ever reading compiler output, in Intel syntax the
assembly for putting the table’s RIP-relative address in rax
looks
like so:
;; NASM:
lea rax, [rel address]
;; Some others:
lea rax, [rip + address]
Or in AT&T syntax:
lea address(%rip), %rax
Virtual Memory
Besides (trivially) more work for the loader, there’s another
consequence to relocations: Pages containing relocations are not
shared between processes (except after fork()). When loading a
program, the loader doesn’t copy programs and libraries to memory so
much as it memory maps their binaries with copy-on-write semantics. If
another process is running with the same binaries loaded (e.g.
libc.so), they’ll share the same physical memory so long as those
pages haven’t been modified by either process. Modifying the page
creates a unique copy for that process.
Relocations modify parts of the loaded binary, so these pages aren’t
shared. This means colors_2d
has the possibility of being shared
between processes, but colors_ptr
(and its entire page) definitely
does not. Shucks.
This is one of the reasons why the Procedure Linkage Table (PLT)
exists. The PLT is an array of function stubs for shared library
functions, such as those in the C standard library. Sure, the loader
could go through the program and fill out the address of every
library function call, but this would modify lots and lots of code
pages, creating a unique copy of large parts of the program. Instead,
the dynamic linker lazily supplies jump addresses for PLT
function stubs, one per accessed library function.
However, as I’ve written it above, it’s unlikely that even colors_2d
will be shared. It’s still missing an important ingredient: const.
Const
They say const isn’t for optimization but, darnit, this
situation keeps coming up. Since colors_ptr
and colors_2d
are both
global, writable arrays, the compiler puts them in the same writable
data section of the program, and, in my test program, they end up
right next to each other in the same page. The other relocations doom
colors_2d
to being a local copy.
Fortunately it’s trivial to fix by adding a const:
const char colors_2d[][7] = { /* ... */ };
Writing to this memory is now undefined behavior, so the compiler is
free to put it in read-only memory (.rodata
) and separate from the
dirty relocations. On my system, this is close enough to the code to
wind up in executable memory.
Note, the equivalent for colors_ptr
requires two const qualifiers,
one for the array and another for the strings. (Obviously the const
doesn’t apply to the loader.)
const char *const colors_ptr[] = { /* ... */ };
String literals are already effectively const, though the C
specification (unlike C++) doesn’t actually define them to be this
way. But, like setting your relationship status on Facebook, declaring
it makes it official.
It’s just micro-optimization
These little details are all deep down the path of micro-optimization
and will rarely ever matter in practice, but perhaps you learned
something broader from all this. This stuff fascinates me.