nullprogram.com/blog/2008/07/20/
I was at my fiancee's parent's house over Fourth of July weekend. Her
family likes to leave plenty of reading material right by the toilet,
which is something fairly new to me. They take their time on the john
quite seriously.
While I was in there I saw a large book
of Sudoku
puzzles. Since the toilet is a good spot to think (I like to call it
my
"
thinking chair"), I thought out an algorithm for solving
Sudokus. I then left the bathroom and implemented it in order to
verify that it worked.
The method is trial-and-error, which it does recursively: fill in the
next available spot with a valid number as defined by the rules
(cannot have the same number in a column, row, or partition), and
recurse. The function reports success (true) when a solution was
found, or failure (false), which means we try the next available
number. If no more valid numbers are available for testing at the
current position, then the puzzle is not solvable (we made an error at
a previous position), so we stop recursing and return failure.
More formally,
- Find an open position.
- Look at that position's row, column, and partition to find valid
numbers to fill in.
- Fill the position with one of the valid choices.
- Recurse using the new change.
- If the recursion reports a problem (returns false), try the next
valid number and repeat.
- If recursion reports success (true), stop guessing and return
success.
- If the list of valid numbers is exhausted, return failure (false).
Note that the recursion depth does not exceed 81, as it only recurses
once per blank square. The "game tree" is broad rather than deep. It
doesn't have to duplicate the puzzle matrix in memory either because
all operations can be done in place.
Here is the implementation in C I typed up just after I left the
bathroom,
I assumed that it would be slow solving the puzzles, having to search
a wide tree, but it turns out to be very fast. It solves normal
human-solvable puzzles in a couple of milliseconds. Wikipedia has a
near-worst case Sudoku that is designed to make algorithms like mine
perform their worst.
On my laptop, my program solves this in 15 seconds, which means that
it should take no more than 15 seconds to solve any given Sudoku
puzzle. This provides me a nice upper limit.
There is a way to "defeat" this particular puzzle. For example, say an
attacker was trying to perform a
denial-of-service (DoS) attack on your Sudoku solver by giving it
puzzles like this one (making your server spend lots of time solving
only a few puzzles). However, these puzzles assume a certain guessing
order. By simply randomizing the order of guessing, both in choosing
positions and the order that numbers are guessed, the attacker will
have a much harder time creating a difficult puzzle. The worst case
could very well be the best case. This is very similar to how
Perl
randomizes its hash array hash functions.
Now suppose we kept our guess order random then "solved" an empty
Sudoku puzzle. What we have is a solution to a randomly generated
Sudoku. To turn it into a puzzle, we just back it off a bit. A Sudoku
is only supposed to have a single unambiguous solution, so we can only
back off until just before the point where two solutions becomes
possible. If you imagine a solution tree, this would be backing up a
branch until you hit a fork.
Normally, Sudokus are symmetric (in the matrix sense), but completely
randomizing the position guessing order won't achieve this. To make
this work, the randomizing process can be adjusted to only select
random points on the upper triangle (including the diagonal). For each
point it selects not on the diagonal, the mirror point is
automatically selected next. This will preserve symmetry when
generating puzzles.
One issue remains: there seems to be no way to control the difficulty
of the puzzles it generates. Maybe a number of open spaces left behind
is a good metric? This will require some further study (and another
post!).