nullprogram.com/blog/2008/08/09/
Stream
ciphers are one of the two types
of
symmetric key algorithms, which is when the same key is used for
encryption and decryption. They follow this simple concept: take a
good pseudo-random number generator and combine, usually by XOR, its
output with your plaintext stream. This is very similiar to
the
one-time pad, but the random pad is pseudo-random rather than
truly random. The key is the seed (or part of one) for
the PRNG.
Probably the most well known stream cipher is the extremely simple,
yet cryptographically
strong, Arcfour
algorithm. The official name is actually RC4, which comes from RSA
Security where it was developed. It was a trade secret until someone
anonymously leaked the algorithm to the public. The name RC4 is still
trademarked, though, so Arcfour is generally used instead, meaning
"Alleged RC4" (alleged because RSA Security never confirmed the
algorithm as being RC4). You have almost certainly used the
cipher yourself, because it is used in applications such as WEP and
SSL.
The algorithm has two parts: the key schedule algorithm and
pseudo-random generation algorithm. The key schedule uses the key,
and possible a non-secret initialization vector, to set up the state
of the PRNG. The state is an array of length 256 holding all of the
values from 0 to 255 in some order, along with two integers
(initialized to 0 after the key schedule). The algorithm looks like
this,
for i from 0 to 255
S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap(S[i],S[j])
endfor
The PRNG deals with one byte at a time, emitting a stream of bytes,
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap(S[i],S[j])
output S[(S[i] + S[j]) mod 256]
endwhile
If you implement this in C and use the char
type, you can
toss the modulus parts because they will just work automatically.
Now you just XOR your message with the output of the PRNG. The
Wikipedia article
probably explains it better than I can, so check it out if you still
don't follow.
Now, Arcfour has some flaws. Specifically, the algorithm itself
doesn't specify how an initialization vector is applied, which is
important. Using a plan key twice is bad it allows an adversary to get
information easily. For example, Lets say you have two
messages A
and B
. You use the same
key k
, which will produce the same keystream K. Now, you
create your two ciphertexts CA
and CB
CA
= A ^ K
CB
= B ^ K
But notice if the adversary has both ciphertexts,
CA
^ CB
= A ^ K ^ B ^ K = A ^ B
They are left with your two original messages XORed together. Let me
illustrate: we have two messages as bitmap images (here as PNGs for
the web),
Encrypt them using the same key. In this case, my key was "somekey".
If the adversary has both of these second "images", she can XOR them
together (having no knowledge of the key!) and get this,
An initialization vector (IV) is a set of bytes we combine with the
key. The IV is not a secret, as it is sent plaintext with the
ciphertext. If different IVs were used above with the same key, XORing
the ciphertext would result in nothing, because the keystreams are
totally different for each message.
However, the way the IV is combined is important too. Simply
concatenating the IV and the key can lead to weaknesses due to the way
the key schedule algorithm works. Something more secure would be a
cryptographic hash of the key and the IV. The reason WEP is broken is
because in its design the IV wasn't used properly.
Another weakness is that the initial bytes of the keystream have low
entropy. That is, some bits tend to be 0's or 1's consistently, which
can leak information to an adversary. This can be countered by tossing
the first few bytes of the keystream. Often the first 768 bytes are
dropped, but a conservative amount would be 3072 bytes. Another way to
deal with this is running the key schedule algorithm 10 or 20 times
(not reinitializing the S array between them of course) rather than
just once, which is the way
CipherSaber-2 does it.
Yet another weakness is that the keystream
becomes
distinguishable from random data after about a gigabyte of
output. That is, after about a gigabyte, the entropy of the overall
stream can become too low and compromise the security of the
message. A solution might be to change the IV each gigabyte.
I wrote an implementation of Arcfour in C, which you can get from my
Git repostiory with,
git clone git://github.com/skeeto/arcfour.git
It is written as a library that can be used in different
applications. Included are a couple programs that make use of it. I
strongly suggest writing your own implementation. It is really easy to
do, and you will automatically have the algorithm memorized once you
do it. The Wikipedia article has some test vectors you can use to test
it.