Download the entire source here: serial.el
I was thinking recently about how software serial codes might work in
proprietary software. They're a self-consistent series of letters and
numbers entered during installation. The software can, without phoning
home, verify the validity of the code. This sort of thing is just DRM
that relies completely on obscurity for security, so it's only a
matter of time before the secret is out, breaking the restriction. But
it's still fun to speculate about them.
In general, serial codes have these two properties.
- Self-consistent. The serial code can be validated without
querying into a database. There is a mathematical consistency,
using either a secret algorithm or secret values. Imagine the
set of all possible serial numbers (the key space), valid and
invalid, and valid serial numbers are distributed
pseudo-randomly throughout that set.
- Short. Short enough for a human to manage manually. They're
generally typed in by hand.
Sometimes serial codes are an entire file, one that would take hours
to type in by hand, but I'm not considering those right now. They have
a trivial solution: use a cryptographic digital signature, because
length isn't an issue. The validation algorithm just checks the
signature against the key on file.
A digital signature is the obvious solution for a serial code, but to
protect these against even the shallowest brute-force attacks it
requires using numbers long enough to break criteria #2.
Here's how I would do it. A serial number consists of two parts: a
code and a checksum. The code is some random number. The checksum is a
number derived from the code.
For my checksum I'm just going to use a hash function: mix the code
component and some secret password together, then hash it to produce
the checksum. To generate a serial number all we have to do is
generate a random number as our code, generate the checksum from it,
then encode them together.
Since we want to keep it short, we have to choose how many characters
of the serial code to dedicate to each component. It's like a slider
we can slide between each side of the encoded serial number. As we
allocate more characters to the code component, the density of valid
serial numbers in the key space increases. If we go all the way and
shrink the checksum to zero characters, then every possible serial
number is valid. The denser the valid serial numbers, the easier it is
to stumble across one during brute force.
In the other direction, the more characters we allocate to the
checksum the more distinct the serial numbers become. If we allocate
all characters to the checksum, the other extreme, then no
serial number is valid. The more distinct the serial numbers are, the
easier they are to find by brute force.
The difficulty of finding a valid serial number by brute force at
various split positions is going to look something like this. The
exact curve is a function of the checksum algorithm and the knowledge
of the attacker.
Since it looks like this, I'm just going to split it right in the
middle: allocate half the characters to the code and half to the
checksum. To begin, I'm going to encode the serial number as
hexadecimal digits. First let's define a little bit about what our
serial codes look like. I'm going to implement this in my favorite
try-stuff language, Elisp.
I decided to use 12 characters, 6 bytes, to represent each
component. This makes for a 24-character serial code. A bit long, but
we'll use a trick to shorten that later.
Next, define the hash and mixing functions. For the hash I'm just
going to use SHA1 and truncate the output. For the password mixing
function, we're just going to concatenate them. There are stronger
ways to perform this mix, such as
concat is good enough for a DRM scheme.
Below, first function is just a random number generator. It could be
fancier, like the UUID generating
function, but this is good enough. The
function was defined in a previous
The serial codes it generates look like this.
Next we need a function that accepts these numbers and checks their
self-consistency. Simple enough: extract the code component, generate
the checksum, and check against the checksum component.
This quick test below always returns true, so it's working at least
Those serial numbers could be shorter because we're limiting ourselves
a-f. Instead, we could base64
encode the serial numbers to cram more information in each character.
Now the serial codes from
serial-gen look like this.
One problem with these is it's easy to mix up the zeros,
Os, ones, and
ls. A filter could be added to the
generator to reject and regenerate any serial number containing these
confusing characters, but that's outside the scope of this post.