This is related to a project I am working on and will post here
soon. I imagine that, with a little more effort, this algorithm could
turn into a short amateur paper.
Suppose you want to use a computer to simulate the roll of two
six-sided dice (notated 2d6). The simplest approach would
be to replicate the results the same way you would roll dice:
independently and randomly generate two numbers between 1 and 6
inclusively. We easily can do this for any number of dice, we just
iterate and roll each die. Like this recursive function,
(defunroll (n sides)
"Roll n dice with given number of sides."
(if (zerop n) ()
(cons (1+ (random sides))
(roll (1- n) sides))))
However, generating a number between 1 and 6 wastes small amounts of
entropy. A six-sided die only takes about 2.58 bits of entropy to
generate. Since we can only use bits discretely we have to spend 3
bits, throwing out 0.42 bits. On top of that, when we pull out 3 bits
and they are out of range (0 or 7) we have to throw them out and try
again.
Let's say we wanted to roll 10 dice, or 100 dice, or 1000 dice? Do we
really need to generate that many numbers individually? That's a lot
of wasted entropy adding up, entropy which can be expensive to
gather. Well, we could instead use the probability distribution of
the roll so that only a single number needs to be generated.
For a 2d6 roll, there are 36 unique possible outcomes
(6^2). We could select a number between 0 and 35, then choose that
specific roll. This roll can be calculated with a series of division
and modulus operations (u for a number from a
uniform distribution) (also, note that the division is
integer division),
(defunroll-perm (n s u)
"Get uth permutation of n s-sided dice."
(if (zerop n) ()
(let ((perms (expt s (1- n))))
(cons (1+ (/ u perms))
(roll-perm (1- n) s (mod u perms))))))
If we're only interested in the sum, we could save memory by making
this tail recursive -- or iterative -- and summing the dice as we
calculate them. Ignoring the exponent, this is O(n), not
better than the simple algorithm in terms of growth rate. This
algorithm is more efficient when it comes to entropy, though.
Consider 3d6, with 216 possible outcomes, ideally with
the simple algorithm takes 3 3-bit rolls, consuming 9 bits. About 1.25
bits was not actually used (0.42 * 3). In the entropy-efficient
algorithm we need about 7.75 bits, so it only consumes 8 bits of
entropy. We saved a bit. That gap only gets larger with more dice. For
100d6 the simple algorithm uses 41 more bits than
necessary.
The efficient roll is basically defragmenting the individual rolls on
the entropy stream.
In non-ideal world, though, some cases don't work out well. In
12d6, almost half the numbers (compared to 25% in the
case of 1d6) from the uniform distribution will be out of range and a
lot more bits would be needed. On average, rolling dice individually
(or only some of them individually) for 12d6 will
be more efficient.
The efficient algorithm is only more efficient above a point near
where mod(log2(s), 1) < mod(log2(n^s), 1).
And, all of this doesn't come without a cost. You must pay the piper,
and this algorithm is paid with CPU and memory. Notice that exponent
there? That has to be done to exact precision (no floating point), and
it grows very quickly. If you want to roll more than a handful of
dice, you will be crunching some large numbers. Rolling just
100d6 means you have to work with a 78 digit
integer. 10000d6 is a 7782 digit integer. These can't be
done in floating point because the resolution of floating point is too
low: some rolls would not be possible.
The exponent could be memoized to
trade some of that CPU time for more memory usage. Still, pretty
costly. If you don't value your entropy, the tradeoff might not be
worth it.
I can't see a way around performing that calculation. We need to know
that big number exactly. Perhaps a mathematician might be able to
manipulate the formulas such that it's not so expensive.
If you're rolling lots of dice and you want to preserve binary
entropy, try it out. If you want to be really efficient queue up rolls
-- or generate them ahead of time -- so that the number of outcomes is
just below a power of two. In the case of d6, some good
number of dice to roll are 17 (~43.94 bits), 29 (~74.96), 41
(~105.983), 94 (~242.986 bits), 200 (~516.993), 253 (~653.995 bits),
306 (~791.99853 bits), and 971 (~2509.99859 bits). (Notice these get
closer and closer to an integer number of bits.)
I've been using lossless optimizers for awhile now for PNGs, but more
recently I have found some for other formats. Here's the ones I know
about. These are all intended to be lossless, so running them should
result in no information loss (well, except the SVG one).
For PNG, there are a number of choices, but my favorite is OptiPNG. It adjust the PNG
parameters and recompresses to find the optimal parameters. I run it
on almost all my images around here, and I tend to get around 10% to
30% reduction for images fresh off
Gimp,
Kolourpaint, and
ImageMagick.
For JPEG, I use jpegoptim. It
works by optimizing the Huffman tables (the lossless part of JPEG
compression). I only found this one recently, but I will be using it
all the time, like on our new thousands of wedding reception photos.
For PDF, I found something called QPDF. It's designed more for
other PDF transformations, but without any other parameters it will
simply losslessly optimize a PDF. From what I've seen so far it cuts
PDFs down by about a third.
For SVG, Scour is a
young project, only a few months old. I've been looking for an SVG
optimizer for some time, so this was exciting to find. Due to the type
of file it's working with, it's not quite entirely lossless. Visually,
it is lossless, but it will toss all metadata (comments, etc.), which
may be important. If you hand-crafted your SVG, you won't want to use
this tool. It's good for removing
Inkscape and Illustrator cruft, though.
I have yet to find a good (Free) GIF optimizer. Animated GIFs, with
lots of redundancy between frames, have a lot of potential for
optimization too. A video optimizer (for, say, Theora) would be
interesting; I imagine it might work similarly to jpegoptim. Audio
files (like Vorbis, FLAC, or MP3) probably don't have any room for
optimization. I could be wrong. For XHTML there is tidy if you want to
count that. All the other XML formats (ODF, RSS, etc.) could have
their own too. Or optimizers for archives, like zip and tar. For tar
it might rearrange things to better suit gzip, bzip2, or
lzma. Executable optimizers? Postscript optimizers? It goes on and on.
If you know about any more, especially for other file formats, let me
know.
A year ago while I was reading Applied
Cryptography I got an idea for using IRC as a random number
generator. The book mentioned using a rolling car wheel to generate
random numbers, by measuring its period. So why not IRC? Grab the
code,
It's based entirely on event timing: it logs in and sits on several
channels, then measures the time between events. When a new event
occurs it compares the time passed between this event and the last
event, and the time between the two events before it. If it's greater,
emit a 1, if less emit a 0. Simple.
For skew removal I used transition mappings
(rfc1750), invented by von Neumann. It looks at pairs of bits. If they
differ, pass the first bit, otherwise toss both bits. So if it comes
across "11" or "00", it tosses them. If it comes across "10" it emits
"1", and "01" it emits 0.
Here's an analysis by ent of 476 output bytes I generated overnight: irc.random.bytes.
Entropy = 7.515132 bits per byte.
Optimum compression would reduce the size
of this 476 byte file by 6 percent.
Chi square distribution for 476 samples is 274.79, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.7248 (127.5 = random).
Monte Carlo value for Pi is 2.886075949 (error 8.13 percent).
Serial correlation coefficient is -0.062745 (totally uncorrelated = 0.0).
Eh, not awful, but not too great. It got a good score on the
Chi-square test, which I attribute to the skew filter. This generator
is also extremely slow, generating only a few bytes per hour. At
best, each event only generates a half of a bit, after skew
correction. It would probably be more profitable to take the hash of
the actual event with a timestamp and use that as the RNG.
Also, someone else running this generator on the same channels would
generate very similar numbers. Worse, someone eavesdropping on your
network connection knows mostly what numbers you generated. Worse yet,
someone actively modifying your connection could control your
generator and force whatever output is desired.
I was reading through the HTML 5
specification earlier tonight and I was delighted to come across an
obvious reference to the Flying
Spaghetti Monster, complete with notes about His "noodly
appendages". I'm guessing that someone silently slipped it in there
and the people that might be offended won't get the reference
anyway.
In the following example, a picture representing the flying
spaghetti monster emblem, with each of the left
noodly appendages and the right noodly appendages in
different images, so that the user can pick the left side or the
right side in an adventure.
For those who are unaware, it is the deity in a parody religion
Pastafarianism. It's often used to demonstrate how silly religion is.
Web pages aren't a static medium, like books, brochures, or
pamphlets.
The web is not print. Accordingly, the layout of web pages should
not be locked to some static width, but instead flow to fill the width
of the browser like a liquid. Web pages should normally have a
liquid layout.
One of the most obvious problems with the fixed layout occurs when the
browser window is stretched wider than the designer had intended. The
page looks like this,
I, as a user, have little control my viewing of the website. I'm stuck
reading through a keyhole. It gets much worse if the browser isn't as
wide as the designer intended: a horizontal scrollbar appears and
navigation becomes very difficult. My laptop runs at a resolution of
1024x768, and I frequently come across pages where this is an
issue. And according to Jakob Nielsen, in 2006 77% of
user's screens were 1024 pixels wide or less.
See the liquid for yourself right here: adjust the width of your
browser and watch this text flow to fill the screen. You can also
bring it in pretty far before you clip an image and the horizontal
scrollbar appears. The exact width depends only on the widest image
being displayed. This also comes into play if you adjust the font
size.
Using a liquid layout
allows the page to work well with a wide variety of screen widths,
and most importantly, gives users lots of control over how they view
the site. It's very unfortunate that (in my experience) most websites
employ a poor, fixed layout. Even web design "expert" websites will
ironically hand out web design tips from within these annoying
confines. One of the biggest culprits driving this is Wordpress, which
has this flawed layout by default.
The very worst offenders tend to be websites with little actual
content, like corporate websites or "artist" portfolios. The less
usable the page, the less I wanted to be there anyway.
So please drop the fancy, low-usability web designs for
something with much better usability. Your users will probably
appreciate it.
I use Adblock Plus to block
advertisements and, more importantly, invisible privacy-breaking
trackers (most people aren't even aware of these). I think ad-blocking
is actually easier than ever, because ads are served from a relatively
small number of domains, rather than from the websites
themselves. Instead of patterns matching parts of a path, I can just
block domains.
Adblock Plus emphasizes this by providing, by default, a pattern
matching the server root. Example,
http://ads.example.com/*
But sometimes advertising websites are trickier, and their sub-domain
is a fairly unique string,
http://ldp38fm.example.com/*
That pattern isn't very useful. I want something more like,
http://*.example.com/*
Unfortunately Adblock Plus doesn't provide this pattern automatically
yet, so I have to do it manually. I think this pattern is less obvious
because the URL format is actually broken. Notice have have two
matching globs (*) rather than just one, even though I am simply
blocking everything under a certain level.
Tim Berners-Lee
regrets the format of the URL, and I agree with him. This is what
URLs like http://ads.example.com/spy-tracker.jsshould look like,
http://com/example/ads/spy-tracker.js
It's a single coherent hierarchy with each level in order. This makes
so much more sense! If I wanted to block example.com and all it's
sub-domains, the pattern is much simpler and less error prone,
http://com/example/*
To anyone who ever reinvents the web: please get it right next time.
Update: There is significant further discussion in the comments.
Don't stop here! This isn't everything. Check out the archives
(on the left) for more posts. Or just have a look at
the index.