nullprogram.com/blog/2019/06/30/
This article was discussed on Hacker News.
One of the frequent challenges in C is that pointers are nothing but a
memory address. A callee who is passed a pointer doesn’t truly know
anything other than the type of object being pointed at, which says some
things about alignment and how that pointer can be used… maybe. If it’s
a pointer to void (void *
) then not even that much is known.
The number of consecutive elements being pointed at is also not known.
It could be as few as zero, so dereferencing would be illegal. This can
be true even when the pointer is not null. Pointers can go one past the
end of an array, at which point it points to zero elements. For example:
void foo(int *);
void bar(void)
{
int array[4];
foo(array + 4); // pointer one past the end
}
In some situations, the number of elements is known, at least to the
programmer. For example, the function might have a contract that says it
must be passed at least N elements, or exactly N elements. This
could be communicated through documentation.
/** Foo accepts 4 int values. */
void foo(int *);
Or it could be implied by the function’s prototype. Despite the
following function appearing to accept an array, that’s actually a
pointer, and the “4” isn’t relevant to the prototype.
C99 introduced a feature to make this a formal part of the prototype,
though, unfortunately, I’ve never seen a compiler actually use this
information.
void foo(int[static 4]); // >= 4 elements, cannot be null
Another common pattern is for the callee to accept a count parameter.
For example, the POSIX write()
function:
ssize_t write(int fd, const void *buf, size_t count);
The necessary information describing the buffer is split across two
arguments. That can become tedious, and it’s also a source of serious
bugs if the two parameters aren’t in agreement (buffer overflow,
information disclosure, etc.). Wouldn’t it be nice if this
information was packed into the pointer itself? That’s essentially the
definition of a fat pointer.
Fat pointers via bit hacks
If we assume some things about the target platform, we can encode fat
pointers inside a plain pointer with some dirty pointer
tricks, exploiting unused bits in the pointer value. For
example, currently on x86-64, only the lower 48 bits of a pointer are
actually used. The other 16 bits could carefully be used for other
information, like communicating the number of elements or bytes:
// NOTE: x86-64 only!
unsigned char buf[1000];
uintptr addr = (uintptr_t)buf & 0xffffffffffff;
uintptr pack = (sizeof(buf) << 48) | addr;
void *fatptr = (void *)pack;
The other side can unpack this to get the components back out. Obviously
16 bits for the count will often be insufficient, so this would more
likely be used for baggy bounds checks.
Further, if we know something about the alignment — say, that it’s
16-byte aligned — then we can also encode information in the least
significant bits, such as a type tag.
Fat pointers via a struct
That’s all fragile, non-portable, and rather limited. A more robust
approach is to lift pointers up into a richer, heavier type, like a
structure.
struct fatptr {
void *ptr;
size_t len;
};
Functions accepting these fat pointers no longer need to accept a count
parameter, and they’d generally accept the fat pointer by value.
fatptr_write(int fd, struct fatptr);
In typical C implementations, the structure fields would be passed
practically, if not exactly, same way as the individual parameters would
have been passed, so it’s really no less efficient. (Update June 2024:
Pengji Zhang pointed out that this applies only to the 2-element struct
fatptr
, and not to 3-element slice headers discussed below.)
To help keep this straight, we might employ some macros:
#define COUNTOF(array) \
(sizeof(array) / sizeof(array[0]))
#define FATPTR(ptr, count) \
(struct fatptr){ptr, count}
#define ARRAYPTR(array) \
FATPTR(array, COUNTOF(array))
/* ... */
unsigned char buf[40];
fatptr_write(fd, ARRAYPTR(buf));
There are obvious disadvantages of this approach, like type confusion
due to that void pointer, the inability to use const
, and just being
weird for C. I wouldn’t use it in a real program, but bear with me.
Before I move on, I want to add one more field to that fat pointer
struct: capacity.
struct fatptr {
void *ptr;
size_t len;
size_t cap;
};
This communicates not how many elements are present (len
), but how
much additional space is left in the buffer. This allows callees know
how much room is left for, say, appending new elements.
// Fix the remainder of an int buffer with a value.
void
fill(struct fatptr ptr, int value)
{
int *buf = ptr.ptr;
for (size_t i = ptr.len; i < ptr.cap; i++) {
buf[i] = value;
}
}
Since the callee modifies the fat pointer, it should be returned:
struct fatptr
fill(struct fatptr ptr, int value)
{
int *buf = ptr.ptr;
for (size_t i = ptr.len; i < ptr.cap; i++) {
buf[i] = value;
}
ptr.len = ptr.cap;
return ptr;
}
Congratulations, you’ve got slices! Except that in Go they’re a proper
part of the language and so doesn’t rely on hazardous hacks or tedious
bookkeeping. The fatptr_write()
function above is nearly functionally
equivalent to the Writer.Write()
method in Go, which accepts a slice:
type Writer interface {
Write(p []byte) (n int, err error)
}
The buf
and count
parameters are packed together as a slice, and
fd
parameter is instead the receiver (the object being acted upon by
the method).
Go slices
Go famously has pointers, including internal pointers, but not pointer
arithmetic. You can take the address of (nearly) anything, but
you can’t make that pointer point at anything else, even if you took the
address of an array element. Pointer arithmetic would undermine Go’s
type safety, so it can only be done through special mechanisms in the
unsafe
package.
But pointer arithmetic is really useful! It’s handy to take an address
of an array element, pass it to a function, and allow that function to
modify a slice (wink, wink) of the array. Slices are pointers that
support exactly this sort of pointer arithmetic, but safely. Unlike
the &
operator which creates a simple pointer, the slice operator
derives a fat pointer.
func fill([]int, int) []int
var array [8]int
// len == 0, cap == 8, like &array[0]
fill(array[:0], 1)
// array is [1, 1, 1, 1, 1, 1, 1, 1]
// len == 0, cap == 4, like &array[4]
fill(array[4:4], 2)
// array is [1, 1, 1, 1, 2, 2, 2, 2]
The fill
function could take a slice of the slice, effectively moving
the pointer around with pointer arithmetic, but without violating memory
safety due to the additional “fat pointer” information. In other words,
fat pointers can be derived from other fat pointers.
Slices aren’t as universal as pointers, at least at the moment. You can
take the address of any variable using &
, but you can’t take a slice
of any variable, even if it would be logically sound.
var foo int
// attempt to make len = 1, cap = 1 slice backed by foo
var fooslice []int = foo[:] // compile-time error!
That wouldn’t be very useful anyway. However, if you really wanted to
do this, the unsafe
package can accomplish it. I believe the resulting
slice would be perfectly safe to use:
// Convert to one-element array, then slice
fooslice = (*[1]int)(unsafe.Pointer(&foo))[:]
Update: Chris Siebenmann speculated about why this requires
unsafe
.
Of course, slices are super flexible and have many more uses that look
less like fat pointers, but this is still how I tend to reason about
slices when I write Go.