nullprogram.com/blog/2010/11/16/
Download the entire source here: serial.el
I was thinking recently about how software serial codes might work in
proprietary software. They're a self-consistent series of letters and
numbers entered during installation. The software can, without phoning
home, verify the validity of the code. This sort of thing is just DRM
that relies completely on obscurity for security, so it's only a
matter of time before the secret is out, breaking the restriction. But
it's still fun to speculate about them.
In general, serial codes have these two properties.
- Self-consistent. The serial code can be validated without
querying into a database. There is a mathematical consistency,
using either a secret algorithm or secret values. Imagine the
set of all possible serial numbers (the key space), valid and
invalid, and valid serial numbers are distributed
pseudo-randomly throughout that set.
- Short. Short enough for a human to manage manually. They're
generally typed in by hand.
Sometimes serial codes are an entire file, one that would take hours
to type in by hand, but I'm not considering those right now. They have
a trivial solution: use a cryptographic digital signature, because
length isn't an issue. The validation algorithm just checks the
signature against the key on file.
A digital signature is the obvious solution for a serial code, but to
protect these against even the shallowest brute-force attacks it
requires using numbers long enough to break criteria #2.
Here's how I would do it. A serial number consists of two parts: a
code and a checksum. The code is some random number. The checksum is a
number derived from the code.
For my checksum I'm just going to use a hash function: mix the code
component and some secret password together, then hash it to produce
the checksum. To generate a serial number all we have to do is
generate a random number as our code, generate the checksum from it,
then encode them together.
Since we want to keep it short, we have to choose how many characters
of the serial code to dedicate to each component. It's like a slider
we can slide between each side of the encoded serial number. As we
allocate more characters to the code component, the density of valid
serial numbers in the key space increases. If we go all the way and
shrink the checksum to zero characters, then every possible serial
number is valid. The denser the valid serial numbers, the easier it is
to stumble across one during brute force.
In the other direction, the more characters we allocate to the
checksum the more distinct the serial numbers become. If we allocate
all characters to the checksum, the other extreme, then no
serial number is valid. The more distinct the serial numbers are, the
easier they are to find by brute force.
The difficulty of finding a valid serial number by brute force at
various split positions is going to look something like this. The
exact curve is a function of the checksum algorithm and the knowledge
of the attacker.
Since it looks like this, I'm just going to split it right in the
middle: allocate half the characters to the code and half to the
checksum. To begin, I'm going to encode the serial number as
hexadecimal digits. First let's define a little bit about what our
serial codes look like. I'm going to implement this in my favorite
try-stuff language, Elisp.
I decided to use 12 characters, 6 bytes, to represent each
component. This makes for a 24-character serial code. A bit long, but
we'll use a trick to shorten that later.
Next, define the hash and mixing functions. For the hash I'm just
going to use SHA1 and truncate the output. For the password mixing
function, we're just going to concatenate them. There are stronger
ways to perform this mix, such as
with HMAC,
but concat
is good enough for a DRM scheme.
Below, first function is just a random number generator. It could be
fancier, like the UUID generating
function, but this is good enough. The compose
function was defined in a previous
post.
The serial codes it generates look like this.
34ef80635ca7340f1f10e170
8471cc8e87378f15377cf038
2ba0b74bfb86262d5400264e
5e02185457970a0b89656e9b
cf0d611512f1aacb22d8095d
Next we need a function that accepts these numbers and checks their
self-consistency. Simple enough: extract the code component, generate
the checksum, and check against the checksum component.
This quick test below always returns true, so it's working at least
partially right.
Those serial numbers could be shorter because we're limiting ourselves
to 0-9
and a-f
. Instead, we could base64
encode the serial numbers to cram more information in each character.
Now the serial codes from serial-gen
look like this.
Nsu8TXFC4IdvlBxc
b5HduiyZmNmcq4fQ
vUd6KuIyDgmBgDWW
dRMW6/rctm2Z5TKn
ol4cP+5b3Y7vqWEZ
One problem with these is it's easy to mix up the zeros,
O
s, ones, and l
s. A filter could be added to the
generator to reject and regenerate any serial number containing these
confusing characters, but that's outside the scope of this post.