nullprogram.com/blog/2024/07/31/
I recently came across a take-home C programming test which had more depth
and complexity than I suspect the interviewer intended. While considering
it, I also came up with a novel, or at least unconventional, solution. The
problem is to deep copy a linked list where each node references a random
list element in addition to usual linkage — similar to LeetCode problem
138. This reference is one of identity rather than value, which has
murky consequences.
typedef struct node node;
struct node {
node *next;
node *ref; // arbitrary node in the list, or null
};
node *deepcopy(node *);
In the copy, nodes have individual lifetimes allocated using malloc
which the caller is responsible for freeing. While thickheaded, this
is conventional, and I cannot blame the test’s designer for sticking to
familiar textbook concepts. My special solution handles this constraint in
stride. (In a well-written program the whole list would have a single
lifetime likely shared with yet more objects.)
Ignoring ref
, copying the normal list linkage is trivial. Walk the
original list, allocate a new node each iteration, and append it to the
result. The hard part is resolving ref
. Given an arbitrary node pointer,
we must determine to which of the original list nodes it points, then find
the node at the matching position in the new list. Naively we could scan
the old list to search for a match:
node *old = oldlist;
node *new = newlist;
for (;;) {
if (old->ref) {
node *findold = oldlist;
node *findnew = newlist;
for (;;) {
if (old->ref == findold) {
new->ref = findnew;
break;
}
findold = findold->next;
findnew = findnew->next;
}
}
old = old->next;
new = new->next;
}
The nested loops are obviously quadratic time. That won’t scale well. To
do better we need some way to map, by identity, old list nodes onto new
list nodes. However, pointers do not necessarily have a value on which we
could key a map. Other languages do not even expose such a concept,
or at least hide it behind some “unsafe” mechanism. In that case it seems
the best we could do is quadratic time.
Solution by temporary mutation
If we’re free to temporarily modify the original list, then we can use
memory as a map. After all, memory itself is a kind of pointer-to-object
map! Since we only get one such map per process, we’ll need to commandeer
the original list during the copy. The trick is to interleave the two
lists when constructing the new list:
old1 -> new1 -> old2 -> new2 -> ... -> null
That might look like (note the double-skip per iteration):
for (node *old = oldlist; old; old = old->next->next) {
node *new = malloc(sizeof(*new));
new->ref = 0;
new->next = old->next;
old->next = new;
}
When we have a pointer to an old list node, the node itself points to the
matching new list node.
for (node *old = oldlist; old; old = old->next->next) {
if (old->ref) {
old->next->ref = old->ref->next;
}
}
Then before returning we’d need to deinterleave the lists, restoring the
old list and separating it from the result. This solution is linear time
and doesn’t require dealing with the concept of identity. Though modifying
the original list isn’t always possible. That won’t work if it’s accessed
concurrently — shared with another thread, accessed in a signal handler,
or something else reentrant — or if it’s in read-only memory.
Solution by intrusive hash map
If we can obtain a stable value from a pointer, i.e. uintptr_t
— in
practice virtually always true — then there’s an interesting O(n log n)
solution using an intrusive map which doesn’t modify the original
list. This is my own novel solution. The result will be simultaneously a
linked list and a hash map, and the caller won’t even know it! Because the
map is built into the list, with a caller-managed lifetime, we won’t free
anything before returning.
To start, linked list nodes are embedded at the front of hash trie nodes.
The caller will see this initial field, but not the hash trie fields.
Being at the front, the caller can still free
them by this “internal”
pointer, which allows the hash trie to be invisible.
typedef struct map map;
struct map {
node new;
map *child[4];
node *old;
};
The “key” is old
and the “value” is new
. Lookup and insert use the
usual “upsert” construction oriented around zero-initialization:
node *upsert(map **m, node *old)
{
if (!old) {
return 0; // map null to null
}
uint64_t hash = (uintptr_t)old * 1111111111111111111ull;
for (; *m; hash <<= 2) {
if (old == (*m)->old) {
return &(*m)->new;
}
m = &(*m)->child[hash>>62];
}
*m = calloc(1, sizeof(map));
(*m)->old = old;
return &(*m)->new;
}
If the matching node doesn’t yet exist, the function creates it. Also note
how it returns an internal pointer. With “upsert” semantics, loop copying
is trivialized:
node *deepcopy(node *head)
{
map *m = 0;
for (node *old = head; old; old = old->next) {
node *new = upsert(&m, old);
new->next = upsert(&m, old->next);
new->ref = upsert(&m, old->ref);
}
return upsert(&m, head);
}
These easy-to-implement hash tries continue to be generally useful and
elegant, even with traditional memory management. Cloneable, runnable
source with tests is available as a gist if you’d like to play
around with it yourself.