nullprogram.com/blog/2016/11/15/
Suppose you want to bake binary data directly into a program’s
executable. It could be image pixel data (PNG, BMP, JPEG), a text
file, or some sort of complex data structure. Perhaps the purpose is
to build a single executable with no extraneous data files — easier to
install and manage, though harder to modify. Or maybe you’re lazy and
don’t want to worry about handling the various complications and
errors that arise when reading external data: Where to find it, and
what to do if you can’t find it or can’t read it. This article is
about two different approaches I’ve used a number of times for C
programs.
The linker approach
The simpler, less portable option is to have the linker do it. Both
the GNU linker and the gold linker (ELF only) can create
object files from arbitrary files using the --format
(-b
) option
set to binary
(raw data). It’s combined with --relocatable
(-r
)
to make it linkable with the rest of the program. MinGW supports all
of this, too, so it’s fairly portable so long as you stick to GNU
Binutils.
For example, to create an object file, my_msg.o
with the
contents of the text file my_msg.txt
:
$ ld -r -b binary -o my_file.o my_msg.txt
(Update: You probably also want to use -z noexecstack
.)
The object file will have three symbols, each named after the input
file. Unfortunately there’s no control over the symbol names, section
(.data), alignment, or protections (e.g. read-only). You’re completely
at the whim of the linker, short of objcopy tricks.
$ nm my_msg.o
000000000000000e D _binary_my_msg_txt_end
000000000000000e A _binary_my_msg_txt_size
0000000000000000 D _binary_my_msg_txt_start
To access these in C, declare them as global variables like so:
extern char _binary_my_msg_txt_start[];
extern char _binary_my_msg_txt_end[];
extern char _binary_my_msg_txt_size;
The size symbol, _binary_my_msg_txt_size
, is misleading. The “A”
from nm means it’s an absolute symbol, not relocated. It doesn’t refer
to an integer that holds the size of the raw data. The value of the
symbol itself is the size of the data. That is, take the address of it
and cast it to an integer.
size_t size = (size_t)&_binary_my_msg_txt_size;
Alternatively — and this is my own preference — just subtract the
other two symbols. It’s cleaner and easier to understand.
size_t size = _binary_my_msg_txt_end - _binary_my_msg_txt_start;
Here’s the “Hello, world” for this approach (hello.c
).
#include <stdio.h>
extern char _binary_my_msg_txt_start[];
extern char _binary_my_msg_txt_end[];
extern char _binary_my_msg_txt_size;
int
main(void)
{
size_t size = _binary_my_msg_txt_end - _binary_my_msg_txt_start;
fwrite(_binary_my_msg_txt_start, size, 1, stdout);
return 0;
}
The program has to use fwrite()
rather than fputs()
because the
data won’t necessarily be null-terminated. That is, unless a null is
intentionally put at the end of the text file itself.
And for the build:
$ cat my_msg.txt
Hello, world!
$ ld -r -b binary -o my_msg.o my_msg.txt
$ gcc -o hello hello.c my_msg.o
$ ./hello
Hello, world!
If this was binary data, such as an image file, the program would
instead read the array as if it were a memory mapped file. In fact,
that’s what it really is: the raw data memory mapped by the loader
before the program started.
How about a data structure dump?
This could be taken further to dump out some kinds of data structures.
For example, this program (table_gen.c
) fills out a table of the
first 90 Fibonacci numbers and dumps it to standard output.
#include <stdio.h>
#define TABLE_SIZE 90
long long table[TABLE_SIZE] = {1, 1};
int
main(void)
{
for (int i = 2; i < TABLE_SIZE; i++)
table[i] = table[i - 1] + table[i - 2];
fwrite(table, sizeof(table), 1, stdout);
return 0;
}
Build and run this intermediate helper program as part of the overall
build.
$ gcc -std=c99 -o table_gen table_gen.c
$ ./table_gen > table.bin
$ ld -r -b binary -o table.o table.bin
And then the main program (print_fib.c
) might look like:
#include <stdio.h>
extern long long _binary_table_bin_start[];
extern long long _binary_table_bin_end[];
int
main(void)
{
long long *start = _binary_table_bin_start;
long long *end = _binary_table_bin_end;
for (long long *x = start; x < end; x++)
printf("%lld\n", *x);
return 0;
}
However, there are some good reasons not to use this feature in this
way:
-
The format of table.bin
is specific to the host architecture
(byte order, size, padding, etc.). If the host is the same as the
target then this isn’t a problem, but it will prohibit
cross-compilation.
-
The linker has no information about the alignment requirements of
the data. To the linker it’s just a byte buffer. In the final
program the long long
array will not necessarily aligned properly
for its type, meaning the above program might crash. The Right Way
is to never dereference the data directly but rather memcpy()
it
into a properly-aligned variable, just as if the data was an
unaligned buffer.
-
The data structure cannot use any pointers. Pointer values are
meaningless to other processes and will be no different than
garbage.
Towards a more portable approach
There’s an easy way to address all three of these problems and
eliminate the reliance on GNU linkers: serialize the data into C code.
It’s metaprogramming, baby.
In the Fibonacci example, change the fwrite()
in table_gen.c
to
this:
printf("int table_size = %d;\n", TABLE_SIZE);
printf("long long table[] = {\n");
for (int i = 0; i < TABLE_SIZE; i++)
printf(" %lldLL,\n", table[i]);
printf("};\n");
The output of the program becomes text:
int table_size = 90;
long long table[] = {
1LL,
1LL,
2LL,
3LL,
/* ... */
1779979416004714189LL,
2880067194370816120LL,
}
And print_fib.c
is changed to:
#include <stdio.h>
extern int table_size;
extern long long table[];
int
main(void)
{
for (int i = 0; i < table_size; i++)
printf("%lld\n", table[i]);
return 0;
}
Putting it all together:
$ gcc -std=c99 -o table_gen table_gen.c
$ ./table_gen > table.c
$ gcc -std=c99 -o print_fib print_fib.c table.c
Any C compiler and linker could do all of this, no problem, making it
more portable. The intermediate metaprogram isn’t a barrier to cross
compilation. It would be compiled for the host (typically identified
through HOST_CC
) and the rest is compiled for the target (e.g.
CC
).
The output of table_gen.c
isn’t dependent on any architecture,
making it cross-compiler friendly. There are also no alignment
problems because it’s all visible to compiler. The type system isn’t
being undermined.
Dealing with pointers
The Fibonacci example doesn’t address the pointer problem — it has no
pointers to speak of. So let’s step it up to a trie using the trie
from the previous article. As a reminder, here it is:
#define TRIE_ALPHABET_SIZE 4
#define TRIE_TERMINAL_FLAG (1U << 0)
struct trie {
struct trie *next[TRIE_ALPHABET_SIZE];
struct trie *p;
int i;
unsigned flags;
};
Dumping these structures out raw would definitely be useless since
they’re almost entirely pointer data. So instead, fill out an array of
these structures, referencing the array itself to build up the
pointers (later filled in by either the linker or the loader). This
code uses the in-place breadth-first traversal technique from
the previous article.
void
trie_serialize(struct trie *t, const char *name)
{
printf("struct trie %s[] = {\n", name);
struct trie *head = t;
struct trie *tail = t;
t->p = NULL;
size_t count = 0;
while (head) {
printf(" {{");
for (int i = 0; i < TRIE_ALPHABET_SIZE; i++) {
struct trie *next = head->next[i];
const char *comma = i ? ", " : "";
if (next) {
/* Add child to the queue. */
tail->p = next;
tail = next;
next->p = NULL;
/* Print the pointer to the child. */
printf("%s%s + %zu", comma, name, ++count);
} else {
printf("%s0", comma);
}
}
printf("}, 0, 0, %u},\n", head->flags & TRIE_TERMINAL_FLAG);
head = head->p;
}
printf("};\n");
}
Remember that list of strings from before?
Which looks like this?
That serializes to this C code:
struct trie root[] = {
{{root + 1, 0, root + 2, 0}, 0, 0, 0},
{{root + 3, root + 4, 0, 0}, 0, 0, 0},
{{root + 5, 0, 0, root + 6}, 0, 0, 0},
{{root + 7, 0, 0, 0}, 0, 0, 0},
{{0, 0, root + 8, 0}, 0, 0, 0},
{{root + 9, 0, 0, root + 10}, 0, 0, 0},
{{0, root + 11, 0, 0}, 0, 0, 0},
{{root + 12, 0, 0, 0}, 0, 0, 0},
{{0, 0, 0, root + 13}, 0, 0, 0},
{{0, 0, 0, 0}, 0, 0, 1},
{{0, 0, 0, 0}, 0, 0, 1},
{{0, 0, 0, root + 14}, 0, 0, 0},
{{root + 15, 0, 0, 0}, 0, 0, 0},
{{0, 0, 0, 0}, 0, 0, 1},
{{0, 0, 0, 0}, 0, 0, 1},
{{0, 0, 0, 0}, 0, 0, 1},
};
This trie can be immediately used at program startup without
initialization, and it can even have new nodes inserted into it. It’s
not without its downsides, particularly because it’s a trie:
-
It’s really going to blow up the size of the binary, especially
when it holds lots of strings. These nodes are anything but
compact.
-
If the code is compiled to be position-independent (-fPIC
), each
of those nodes is going to hold multiple dynamic relocations,
further exploding the size of the binary and preventing the trie
from being shared between processes. It’s 24 bytes per
relocation on x86-64. This will also slow down program start up
time. With just a few thousand strings, the simple test program was
taking 5x longer to start (25ms instead of 5ms) than with an empty
trie.
-
Even without being position-independent, the linker will have to
resolve all the compile-time relocations. I was able to overwhelm
linker and run it out of memory with just some tens of thousands of
strings. This would make for a decent linker stress test.
This technique obviously doesn’t scale well with trie data. You’re
better off baking in the flat string list and building the trie at run
time — though you could compute the exact number of needed nodes at
compile time and statically allocate them (in .bss). I’ve personally
had much better luck with other sorts of lookup tables.
It’s a useful tool for the C programmer’s toolbelt.