Software Serial Codes
Download the entire source here: serial.el
I was thinking recently about how software serial codes might work in proprietary software. They're a self-consistent series of letters and numbers entered during installation. The software can, without phoning home, verify the validity of the code. This sort of thing is just DRM that relies completely on obscurity for security, so it's only a matter of time before the secret is out, breaking the restriction. But it's still fun to speculate about them.
In general, serial codes have these two properties.
- Self-consistent. The serial code can be validated without querying into a database. There is a mathematical consistency, using either a secret algorithm or secret values. Imagine the set of all possible serial numbers (the key space), valid and invalid, and valid serial numbers are distributed pseudo-randomly throughout that set.
- Short. Short enough for a human to manage manually. They're generally typed in by hand.
Sometimes serial codes are an entire file, one that would take hours to type in by hand, but I'm not considering those right now. They have a trivial solution: use a cryptographic digital signature, because length isn't an issue. The validation algorithm just checks the signature against the key on file.
A digital signature is the obvious solution for a serial code, but to protect these against even the shallowest brute-force attacks it requires using numbers long enough to break criteria #2.
Here's how I would do it. A serial number consists of two parts: a code and a checksum. The code is some random number. The checksum is a number derived from the code.
For my checksum I'm just going to use a hash function: mix the code component and some secret password together, then hash it to produce the checksum. To generate a serial number all we have to do is generate a random number as our code, generate the checksum from it, then encode them together.
Since we want to keep it short, we have to choose how many characters of the serial code to dedicate to each component. It's like a slider we can slide between each side of the encoded serial number. As we allocate more characters to the code component, the density of valid serial numbers in the key space increases. If we go all the way and shrink the checksum to zero characters, then every possible serial number is valid. The denser the valid serial numbers, the easier it is to stumble across one during brute force.
In the other direction, the more characters we allocate to the checksum the more distinct the serial numbers become. If we allocate all characters to the checksum, the other extreme, then no serial number is valid. The more distinct the serial numbers are, the easier they are to find by brute force.
The difficulty of finding a valid serial number by brute force at various split positions is going to look something like this. The exact curve is a function of the checksum algorithm and the knowledge of the attacker.
Since it looks like this, I'm just going to split it right in the middle: allocate half the characters to the code and half to the checksum. To begin, I'm going to encode the serial number as hexadecimal digits. First let's define a little bit about what our serial codes look like. I'm going to implement this in my favorite try-stuff language, Elisp.
I decided to use 12 characters, 6 bytes, to represent each component. This makes for a 24-character serial code. A bit long, but we'll use a trick to shorten that later.
Next, define the hash and mixing functions. For the hash I'm just
going to use SHA1 and truncate the output. For the password mixing
function, we're just going to concatenate them. There are stronger
ways to perform this mix, such as
concat is good enough for a DRM scheme.
Below, first function is just a random number generator. It could be
fancier, like the UUID generating
function, but this is good enough. The
function was defined in a previous
The serial codes it generates look like this.
34ef80635ca7340f1f10e170 8471cc8e87378f15377cf038 2ba0b74bfb86262d5400264e 5e02185457970a0b89656e9b cf0d611512f1aacb22d8095d
Next we need a function that accepts these numbers and checks their self-consistency. Simple enough: extract the code component, generate the checksum, and check against the checksum component.
This quick test below always returns true, so it's working at least partially right.
Those serial numbers could be shorter because we're limiting ourselves
a-f. Instead, we could base64
encode the serial numbers to cram more information in each character.
Now the serial codes from
serial-gen look like this.
Nsu8TXFC4IdvlBxc b5HduiyZmNmcq4fQ vUd6KuIyDgmBgDWW dRMW6/rctm2Z5TKn ol4cP+5b3Y7vqWEZ
One problem with these is it's easy to mix up the zeros,
Os, ones, and
ls. A filter could be added to the
generator to reject and regenerate any serial number containing these
confusing characters, but that's outside the scope of this post.