To my surprise, this turned out to be harder than I expected. A straightforward scan with Boyer-Moore-Horspool across the entire text file is already pretty fast. On modern, COTS hardware it takes about 6 seconds. Comparing bytes is cheap and it’s largely an I/O-bound problem. This means building fancy indexes tends to make it slower because it’s more I/O demanding.
The challenge was inspired by The Pi-Search Page, which offers a search on the first 200 million digits. There’s also a little write-up about how their pi search works. I wanted to try to invent my own solution. I did eventually come up with something that worked, which can be found here. It’s written in plain old C.
You might want to give the challenge a shot on your own before continuing!
The first thing I tried was SQLite. I thought an index (B-tree) over
fixed-length substrings would be efficient. A LIKE
condition with a
right-hand wildcard is sargable and would work well with the
index. Here’s the schema I tried.
CREATE TABLE digits
(position INTEGER PRIMARY KEY, sequence TEXT NOT NULL)
There will be 1 row for each position, i.e. 1 billion rows. Using
INTEGER PRIMARY KEY
means position
will be used directly for row
IDs, saving some database space.
After the data has been inserted by sliding a window along pi, I build an index. It’s better to build an index after data is in the database than before.
CREATE INDEX sequence_index ON digits (sequence, position)
This takes several hours to complete. When it’s done the database is a whopping 60GB! Remember I said that this is very much an I/O-bound problem? I wasn’t kidding. This doesn’t work well at all. Here’s the a search for the example sequence.
SELECT position, sequence FROM digits
WHERE sequence LIKE '141592653%'
You get your answers after about 15 minutes of hammering on the disk.
Sometime later I realized that up to 18-digits sequences could be
encoded into an integer, so that TEXT
column could be a much simpler
INTEGER
. Unfortunately this doesn’t really improve anything. I also
tried this in PostgreSQL but it was even worse. I gave up after 24
hours of waiting on it. These databases are not built for such long,
skinny tables, at least not without beefy hardware.
A couple weeks later I had another idea. A query is just a sequence of
digits, so it can be trivially converted into a unique number. As
before, pick a fixed length for sequences (n
) for the index and an
appropriate stride. The database would be one big file. To look up a
sequence, treat that sequence as an offset into the database and seek
into the database file to that offset times the stride. The total size
of the database is 10^n * stride
.
In this quick and dirty illustration, n=4 and stride=4 (far too small for that n).
For example, if the fixed-length for sequences is 6 and the stride is
4,000 bytes, looking up “141592” is just a matter of seeking to byte
141,592 * 4,000
and reading in positions until some sort of
sentinel. The stride must be long enough to store all the positions
for any indexed sequence.
For this purpose, the digits of pi are practically random numbers. The
good news is that it means a fixed stride will work well. Any
particular sequence appears just as often as any other. The chance a
specific n-length sequence begins at a specific position is 1 /
10^n
. There are 1 billion positions, so a particular sequence will
have 1e9 / 10^n
positions associated with it, which is a good place
to start for picking a stride.
The bad news is that building the index means jumping around the database essentially at random for each write. This will break any sort of cache between the program and the hard drive. It’s incredibly slow, even mmap()ed. The workaround is to either do it entirely in RAM (needs at least 6GB of RAM for 1 billion digits!) or to build it up over many passes. I didn’t try it on an SSD but maybe the random access is more tolerable there.
Doing all the work in memory makes it easier to improve the database
format anyway. It can be broken into an index section and a tables
section. Instead of a fixed stride for the data, front-load the
database with a similar index that points to the section (table) of
the database file that holds that sequence’s pi positions. Each of the
10^n
positions gets a single integer in the index at the front of
the file. Looking up the positions for a sequence means parsing the
sequence as a number, seeking to that offset into the beginning of the
database, reading in another offset integer, and then seeking to that
new offset. Now the database is compact and there are no concerns
about stride.
No sentinel mark is needed either. The tables are concatenated in order in the table part of the database. To determine where to stop, take a peek at the next sequence’s start offset in the index. Its table immediately follows, so this doubles as an end offset. For convenience, one final integer in the index will point just beyond the end of the database, so the last sequence (99999…) doesn’t require special handling.
If the database built for fixed length sequences, how is a sequence of a different length searched? The two cases, shorter and longer, are handled differently.
If the sequence is shorter, fill in the remaining digits, …000 to …999, and look up each sequence. For example, if n=6 and we’re searching for “1415”, get all the positions for “141500”, “141501”, “141502”, …, “141599” and concatenate them. Fortunately the database already has them stored this way! Look up the offsets for “141500” and “141600” and grab everything in between. The downside is that the pi positions are only partially sorted, so they may require sorting before presenting to the user.
If the sequence is longer, the original digits file will be needed. Get the table for the subsequence fixed-length prefix, then seek into the digits file checking each of the pi positions for a full match. This requires lots of extra seeking, but a long sequence will naturally have fewer positions to test. For example, if n=7 and we’re looking for “141592653”, look up the “1415926” table in the database and check each of its 106 positions.
With this database searches are only a few milliseconds (though very much subject to cache misses). Here’s my program in action, from the repository linked above.
$ time ./pipattern 141592653
1: 14159265358979323
427238911: 14159265303126685
570434346: 14159265337906537
678096434: 14159265360713718
real 0m0.004s
user 0m0.000s
sys 0m0.000s
I call that challenge completed!
]]>However, if we’re interested only in rendering a Voronoi diagram as a bitmap, there’s a trivial brute for algorithm. For every pixel of output, determine the closest seed vertex and color that pixel appropriately. It’s slow, especially as the number of seed vertices goes up, but it works perfectly and it’s dead simple!
Does this strategy seem familiar? It sure sounds a lot like an OpenGL fragment shader! With a shader, I can push the workload off to the GPU, which is intended for this sort of work. Here’s basically what it looks like.
/* voronoi.frag */
uniform vec2 seeds[32];
uniform vec3 colors[32];
void main() {
float dist = distance(seeds[0], gl_FragCoord.xy);
vec3 color = colors[0];
for (int i = 1; i < 32; i++) {
float current = distance(seeds[i], gl_FragCoord.xy);
if (current < dist) {
color = colors[i];
dist = current;
}
}
gl_FragColor = vec4(color, 1.0);
}
If you have a WebGL-enabled browser, you can see the results for yourself here. Now, as I’ll explain below, what you see here isn’t really this shader, but the result looks identical. There are two different WebGL implementations included, but only the smarter one is active. (There’s also a really slow HTML5 canvas fallback.)
You can click and drag points around the diagram with your mouse. You can add and remove points with left and right clicks. And if you press the “a” key, the seed points will go for a random walk, animating the whole diagram. Here’s a (HTML5) video showing it off.
Unfortunately, there are some serious problems with this approach. It has to do with passing seed information as uniforms.
The number of seed vertices is hardcoded. The shader language
requires uniform arrays to have known lengths at compile-time. If I
want to increase the number of seed vertices, I need to generate,
compile, and link a new shader to replace it. My implementation
actually does this. The number is replaced with a %%MAX%%
template that I fill in using a regular expression before sending
the program off to the GPU.
The number of available uniform bindings is very constrained,
even on high-end GPUs: GL_MAX_FRAGMENT_UNIFORM_VECTORS
. This
value is allowed to be as small as 16! A typical value on high-end
graphics cards is a mere 221. Each array element counts as a
binding, so our shader may be limited to as few as 8 seed vertices.
Even on nice GPUs, we’re absolutely limited to 110 seed vertices.
An alternative approach might be passing seed and color information
as a texture, but I didn’t try this.
There’s no way to bail out of the loop early, at least with
OpenGL ES 2.0 (WebGL) shaders. We can’t break
or do any sort of
branching on the loop variable. Even if we only have 4 seed
vertices, we still have to compare against the full count. The GPU
has plenty of time available, so this wouldn’t be a big issue,
except that we need to skip over the “unused” seeds somehow. They
need to be given unreasonable position values. Infinity would be an
unreasonable value (infinitely far away), but GLSL floats aren’t
guaranteed to be able to represent infinity. We can’t even know
what the maximum floating-point value might be. If we pick
something too large, we get an overflow garbage value, such as 0
(!!!) in my experiments.
Because of these limitations, this is not a very good way of going about computing Voronoi diagrams on a GPU. Fortunately there’s a much much better approach!
With the above implemented, I was playing around with the fragment shader, going beyond solid colors. For example, I changed the shade/color based on distance from the seed vertex. A results of this was this “blood cell” image, a difference of a couple lines in the shader.
That’s when it hit me! Render each seed as cone pointed towards the camera in an orthographic projection, coloring each cone according to the seed’s color. The Voronoi diagram would work itself out automatically in the depth buffer. That is, rather than do all this distance comparison in the shader, let OpenGL do its normal job of figuring out the scene geometry.
Here’s a video (GIF) I made that demonstrates what I mean.
Not only is this much faster, it’s also far simpler! Rather than being limited to a hundred or so seed vertices, this version could literally do millions of them, limited only by the available memory for attribute buffers.
There’s a catch, though. There’s no way to perfectly represent a cone
in OpenGL. (And if there was, we’d be back at the brute force approach
as above anyway.) The cone must be built out of primitive triangles,
sort of like pizza slices, using GL_TRIANGLE_FAN
mode. Here’s a cone
made of 16 triangles.
Unlike the previous brute force approach, this is an approximation of the Voronoi diagram. The more triangles, the better the approximation, converging on the precision of the initial brute force approach. I found that for this project, about 64 triangles was indistinguishable from brute force.
At this point things are looking pretty good. On my desktop, I can
maintain 60 frames-per-second for up to about 500 seed vertices moving
around randomly (“a”). After this, it becomes draw-bound because
each seed vertex requires a separate glDrawArrays() call to OpenGL.
The workaround for this is an OpenGL extension called instancing. The
WebGL extension for instancing is ANGLE_instanced_arrays
.
The cone model was already sent to the GPU during initialization, so, without instancing, the draw loop only has to bind the uniforms and call draw for each seed. This code uses my Igloo WebGL library to simplify the API.
var cone = programs.cone.use()
.attrib('cone', buffers.cone, 3);
for (var i = 0; i < seeds.length; i++) {
cone.uniform('color', seeds[i].color)
.uniform('position', seeds[i].position)
.draw(gl.TRIANGLE_FAN, 66); // 64 triangles == 66 verts
}
It’s driving this pair of shaders.
/* cone.vert */
attribute vec3 cone;
uniform vec2 position;
void main() {
gl_Position = vec4(cone.xy + position, cone.z, 1.0);
}
/* cone.frag */
uniform vec3 color;
void main() {
gl_FragColor = vec4(color, 1.0);
}
Instancing works by adjusting how attributes are stepped. Normally the vertex shader runs once per element, but instead we can ask that some attributes step once per instance, or even once per multiple instances. Uniforms are then converted to vertex attribs and the “loop” runs implicitly on the GPU. The instanced glDrawArrays() call takes one additional argument: the number of instances to draw.
ext = gl.getExtension("ANGLE_instanced_arrays"); // only once
programs.cone.use()
.attrib('cone', buffers.cone, 3)
.attrib('position', buffers.positions, 2)
.attrib('color', buffers.colors, 3);
/* Tell OpenGL these iterate once (1) per instance. */
ext.vertexAttribDivisorANGLE(cone.vars['position'], 1);
ext.vertexAttribDivisorANGLE(cone.vars['color'], 1);
ext.drawArraysInstancedANGLE(gl.TRIANGLE_FAN, 0, 66, seeds.length);
The ugly ANGLE names are because this is an extension, not part of WebGL itself. As such, my program will fall back to use multiple draw calls when the extension is not available. It’s only there for a speed boost.
Here are the new shaders. Notice the uniforms are gone.
/* cone-instanced.vert */
attribute vec3 cone;
attribute vec2 position;
attribute vec3 color;
varying vec3 vcolor;
void main() {
vcolor = color;
gl_Position = vec4(cone.xy + position, cone.z, 1.0);
}
/* cone-instanced.frag */
varying vec3 vcolor;
void main() {
gl_FragColor = vec4(vcolor, 1.0);
}
On the same machine, the instancing version can do a few thousand seed vertices (an order of magnitude more) at 60 frames-per-second, after which it becomes bandwidth saturated. This is because, for the animation, every vertex position is updated on the GPU on each frame. At this point it’s overcrowded anyway, so there’s no need to support more.
]]>Write a program that simulates the spreading of a rumor among a group of people. At any given time, each person in the group is in one of three categories:
- IGNORANT - the person has not yet heard the rumor
- SPREADER - the person has heard the rumor and is eager to spread it
- STIFLER - the person has heard the rumor but considers it old news and will not spread it
At the very beginning, there is one spreader; everyone else is ignorant. Then people begin to encounter each other.
So the encounters go like this:
- If a SPREADER and an IGNORANT meet, IGNORANT becomes a SPREADER.
- If a SPREADER and a STIFLER meet, the SPREADER becomes a STIFLER.
- If a SPREADER and a SPREADER meet, they both become STIFLERS.
- In all other encounters nothing changes.
Your program should simulate this by repeatedly selecting two people randomly and having them “meet.”
There are three questions we want to answer:
- Will everyone eventually hear the rumor, or will it die out before everyone hears it?
- If it does die out, what percentage of the population hears it?
- How long does it take? i.e. How many encounters occur before the rumor dies out?
I wrote a very thorough version to produce videos of the simulation in action.
It accepts some command line arguments, so you don’t need to edit any code just to try out some simple things.
And here are a couple of videos. Each individual is a cell in a 2D grid. IGNORANT is black, SPREADER is red, and STIFLER is white. Note that this is not a cellular automata, because cell neighborship does not come into play.
Here’s are the statistics for ten different rumors.
Rumor(n=10000, meetups=132380, knowing=0.789)
Rumor(n=10000, meetups=123944, knowing=0.7911)
Rumor(n=10000, meetups=117459, knowing=0.7985)
Rumor(n=10000, meetups=127063, knowing=0.79)
Rumor(n=10000, meetups=124116, knowing=0.8025)
Rumor(n=10000, meetups=115903, knowing=0.7952)
Rumor(n=10000, meetups=137222, knowing=0.7927)
Rumor(n=10000, meetups=134354, knowing=0.797)
Rumor(n=10000, meetups=113887, knowing=0.8025)
Rumor(n=10000, meetups=139534, knowing=0.7938)
Except for very small populations, the simulation always terminates very close to 80% rumor coverage. I don’t understand (yet) why this is, but I find it very interesting.
]]>let
isn’t a
special form. That is, it’s not a hard-coded language feature, or
special form. It’s built on top of lambda
. In any lexically-scoped
Lisp, the expression,
(let ((x 10)
(y 20))
(* 10 20))
Can also be written as,
((lambda (x y)
(* x y))
10 20)
BrianScheme’s let
is just a macro that transforms into a lambda
expression. This is also what made it so important to implement lambda
lifting, to optimize these otherwise-expensive forms.
It’s possible to achieve a similar effect in GNU Octave (but not Matlab, due to its flawed parser design). The language permits simple lambda expressions, much like Python.
> f = @(x) x + 10;
> f(4)
ans = 14
It can be used to create a scope in a language that’s mostly devoid of scope. For example, I can avoid assigning a value to a temporary variable just because I need to use it in two places. This one-liner generates a random 3D unit vector.
(@(v) v / norm(v))(randn(1, 3))
The anonymous function is called inside the same expression where it’s
created. In practice, doing this is stupid. It’s confusing and there’s
really nothing to gain by being clever, doing it in one line instead
of two. Most importantly, there’s no macro system that can turn this
into a new language feature. However, I enjoyed using this technique
to create a one-liner that generates n
random unit vectors.
n = 1000;
p = (@(v) v ./ repmat(sqrt(sum(abs(v) .^ 2, 2)), 1, 3))(randn(n, 3));
Why was I doing this? I was using the Monte Carlo method to double-check my solution to this math problem:
What is the average straight line distance between two points on a sphere of radius 1?
I was also demonstrating to Gavin that simply choosing two angles is insufficient, because the points the angles select are not evenly distributed over the surface of the sphere. I generated this video, where the poles are clearly visible due to the uneven selection by two angles.
This took hours to render with gnuplot! Here are stylized versions: Dark and Light.
]]>The other day I came across this neat visual trick: How to simulate liquid (Flash). It’s a really simple way to simulate some natural-looking liquid.
I [made my own version][fun] in Java, using JBox2D for the physics simulation.
For those of you who don’t want to run a Java applet, here’s a video demonstration. Gravity is reversed every few seconds, causing the liquid to slosh up and down over and over. The two triangles on the sides help mix things up a bit. The video flips through the different components of the animation.
It’s not a perfect liquid simulation. The surface never settles down, so the liquid is lumpy, like curdled milk. There’s also a lack of cohesion, since JBox2D doesn’t provide cohesion directly. However, I think I could implement cohesion on my own by writing a custom contact.
JBox2D is a really nice, easy-to-use 2D physics library. I only had to read the first two chapters of the Box2D manual. Everything else can be figured out through the JBox2D Javadocs. It’s also available from the Maven repository, which is the reason I initially selected it. My only complaint so far is that the API doesn’t really follow best practice, but that’s probably because it follows the Box2D C++ API so closely.
I’m excited about JBox2D and I plan on using it again for some future project ideas. Maybe even a game.
The most computationally intensive part of the process isn’t the physics. That’s really quite cheap. It’s actually blurring, by far. Blurring involves convolving a kernel over the image — O(n^2) time. The graphics card would be ideal for that step, probably eliminating it as a bottleneck, but it’s unavailable to pure Java. I could have pulled in lwjgl, but I wanted to keep it simple, so that it could be turned into a safe applet.
As a result, it may not run smoothly on computers that are more than a couple of years old. I’ve been trying to come up with a cheaper alternative, such as rendering a transparent halo around each ball, but haven’t found anything yet. Even with that fix, thresholding would probably be the next bottleneck — something else the graphics card would be really good at.
]]>I generated some noise, looked at it with surf()
, and repeated until
I found something useful. (Update June 2012: the function is called
perlin()
but it’s not actually Perlin noise.)
m = perlin(1024);
surf(m);
The generated terrain is really quite rough, so I decided to smooth it out by convolving it with a 2-dimensional Gaussian kernel.
k = fspecial('gaussian', 9);
ms = conv2(m, k, 'same');
It still wasn’t smooth enough. So I repeated the process a bit,
for i = 1:10
ms = conv2(ms, k, 'same');
end
Perfect! I used that for my presentation. However, I was having fun
and decided to experiment more with this. I filtered it again another
1000 times and generated a surf()
plot with a high-resolution
colormap — the default colormap size caused banding.
colormap(copper(1024));
surf(ms, 'EdgeAlpha', 0);
axis('equal');
It produced this beautiful result!
I think it looks like a photograph from a high-powered microscope, or maybe the turbulent surface of some kind of creamy beverage being stirred.
At work when I need something Matlab-ish, I use Octave about half the
time and Matlab the other half. In this case, I was using
Matlab. Octave doesn’t support the EdgeAlpha
property, nor the
viewshed()
function that I needed for my work. Matlab currently
makes much prettier plots than Octave.
Under Gavin's suggestion, I've been watching The Prisoner, a 1960's British television show. The main character is an ex-spy held prisoner in "the Village", an Orwellian, isolated, enclosed town. No one in the Village has a name, but is instead assigned a number. The main character's number is 6.
As far as I can tell, after number 2 the order of the numbers is not important. Number 56 is no more important than number 12. By using numbers to name things there is an implied ordering, even if the the ordering is insignificant. It could be misleading to a newcomer.
Is there an unordered set could be used to name things? More specifically, is there a set that cannot be ordered? If it is unorderable then there is no implicit ordering to cause confusion. It's easy to have an unorderable set in theory, but I think it is difficult to have in practice.
Using letters is obviously out, as the alphabet has an order. Words and names made of letters can be sorted according to the alphabet. However, the ability to order words is almost never used outside of indexing. If words are used to name things, a newcomer is unlikely to assume relationships based on ordering. No one will assume Alan is more important than Bob.
Large numbers also tend to lack an assumed order. I don't think anyone assumes a larger or smaller social security number has meaning, or a larger or smaller phone number. However, these values are also known to be handed out in some semi-random way.
But can we do better? For at least English speakers, is it possible to create an unorderable set? If the items in the set have a vocal pronunciation, then they can probably be ordered by their phonetics. That could be avoided by using non-standard phonetic components, like clicks and pops, which won't have a standard ordering (in English, anyway).
A set has an order if there is a total, transitive, relational operator for the set. If such an operator does not exist then the set isn't linearly ordered. I want a set that can't easily have such an operator.
If a set of symbols was created, how might they be presented as to show no ordering. The order of the symbols in the original presentation might be considered the ordering, like how the alphabet is always presented in order. A circle could be used, but this is circularly ordered. I think there is also the issue of memorization. A human will have a much better time memorizing the symbols if memorized in some order. For example, try naming all the letters of the alphabet at random, without repeats. Or US states.
Thanks to modern day technology, with dynamic content, the set could be displayed in a random order each time it is viewed. For a web page, the server could select a random order, or a JavaScript program could reorder the images at random.
There could be partially ordered sets, like hierarchies and DAGs. The ordering in The Prisoner is one of these. There is number 1, then number 2, then everyone else. Is there a partially ordered set in use that has unique names at the same level?
The penalties incurred by intentionally prohibiting order would likely outweigh the benefit of the set. If it's not orderable, we can't index it, and it's difficult to deal with. I expect it's much easer to just use numbers and tell people that the order isn't important, or just use an obviously unordered set.
]]>This is related to a project I am working on and will post here soon. I imagine that, with a little more effort, this algorithm could turn into a short amateur paper.
Suppose you want to use a computer to simulate the roll of two
six-sided dice (notated 2d6
). The simplest approach would
be to replicate the results the same way you would roll dice:
independently and randomly generate two numbers between 1 and 6
inclusively. We easily can do this for any number of dice, we just
iterate and roll each die. Like this recursive function,
However, generating a number between 1 and 6 wastes small amounts of entropy. A six-sided die only takes about 2.58 bits of entropy to generate. Since we can only use bits discretely we have to spend 3 bits, throwing out 0.42 bits. On top of that, when we pull out 3 bits and they are out of range (0 or 7) we have to throw them out and try again.
Let's say we wanted to roll 10 dice, or 100 dice, or 1000 dice? Do we really need to generate that many numbers individually? That's a lot of wasted entropy adding up, entropy which can be expensive to gather. Well, we could instead use the probability distribution of the roll so that only a single number needs to be generated.
For a 2d6
roll, there are 36 unique possible outcomes
(6^2). We could select a number between 0 and 35, then choose that
specific roll. This roll can be calculated with a series of division
and modulus operations (u
for a number from a
uniform distribution) (also, note that the division is
integer division),
If we're only interested in the sum, we could save memory by making
this tail recursive — or iterative — and summing the dice as we
calculate them. Ignoring the exponent, this is O(n)
, not
better than the simple algorithm in terms of growth rate. This
algorithm is more efficient when it comes to entropy, though.
Consider 3d6
, with 216 possible outcomes, ideally with
the simple algorithm takes 3 3-bit rolls, consuming 9 bits. About 1.25
bits was not actually used (0.42 * 3). In the entropy-efficient
algorithm we need about 7.75 bits, so it only consumes 8 bits of
entropy. We saved a bit. That gap only gets larger with more dice. For
100d6
the simple algorithm uses 41 more bits than
necessary.
The efficient roll is basically defragmenting the individual rolls on the entropy stream.
In non-ideal world, though, some cases don't work out well. In
12d6
, almost half the numbers (compared to 25% in the
case of 1d6) from the uniform distribution will be out of range and a
lot more bits would be needed. On average, rolling dice individually
(or only some of them individually) for 12d6
will
be more efficient.
The efficient algorithm is only more efficient above a point near
where mod(log2(s), 1) < mod(log2(n^s), 1)
.
And, all of this doesn't come without a cost. You must pay the piper,
and this algorithm is paid with CPU and memory. Notice that exponent
there? That has to be done to exact precision (no floating point), and
it grows very quickly. If you want to roll more than a handful of
dice, you will be crunching some large numbers. Rolling just
100d6
means you have to work with a 78 digit
integer. 10000d6
is a 7782 digit integer. These can't be
done in floating point because the resolution of floating point is too
low: some rolls would not be possible.
The exponent could be memoized to trade some of that CPU time for more memory usage. Still, pretty costly. If you don't value your entropy, the tradeoff might not be worth it.
I can't see a way around performing that calculation. We need to know that big number exactly. Perhaps a mathematician might be able to manipulate the formulas such that it's not so expensive.
If you're rolling lots of dice and you want to preserve binary
entropy, try it out. If you want to be really efficient queue up rolls
— or generate them ahead of time — so that the number of outcomes is
just below a power of two. In the case of d6
, some good
number of dice to roll are 17 (~43.94 bits), 29 (~74.96), 41
(~105.983), 94 (~242.986 bits), 200 (~516.993), 253 (~653.995 bits),
306 (~791.99853 bits), and 971 (~2509.99859 bits). (Notice these get
closer and closer to an integer number of bits.)
Awhile ago I wanted to find every Hamiltonian path in the contiguous 48 states. That is, trips that visit each state exactly once. Writing a program to search for Hamiltonian paths is easy (I did this already). The most time consuming part was actually putting together the data that specified the graph to be searched. I hope someone somewhere finds it useful. Here is a map for reference,
It took me several passes before I stopped finding errors. I think I have it all right now, but there could still be some mistakes. If you see one, leave a comment and I'll fix it here. Here is the graph as an S-expression alist; the car (first) element in each list is a state, and the cdr (rest) is the unordered list of states that can be reached from it.
((me nh) (nh vt ma me) (vt ny ma nh) (ma ri ct ny nh vt) (ny pa nj ma ct vt) (ri ma ct) (ct ri ma ny) (nj pa ny de) (de md pa nj) (pa nj ny de md wv oh) (md pa de va wv) (va md wv ky tn nc) (nc va tn ga sc) (sc nc ga) (ga fl sc al nc tn) (al ms fl ga tn) (ms la ar tn al) (tn ms al ga nc va ky mo ar) (ky wv va tn mo il in oh) (wv md pa oh ky va) (oh pa wv ky in mi) (fl al ga) (mi wi oh in) (wi mn ia il mi) (il in ky mo ia wi) (in oh ky il mi) (mo il ky tn ar ok ks ne ia) (ar mo tn ms la tx ok) (la ms ar tx) (tx ok nm ar la) (ok ks mo ar tx nm co) (ks ok co ne mo) (ne sd ia mo ks co wy) (sd nd mn ia ne wy mt) (nd mt sd mn) (ia ne mo il wi mn sd) (mn wi ia sd nd) (mt id wy sd nd) (wy id ut co ne sd mt) (co ne ks ok nm ut wy) (nm co ok tx az) (az nm ut ca nv) (ut nv id wy co az) (id mt wy ut nv or wa) (wa or id) (or wa id nv ca) (nv or id ut az ca) (ca az nv or))
Note that all paths must start or end in Maine because it connects to only one other state.
]]>On my brainfuck compiler project, I proposed pre-calculation as an optimization technique. The idea can work, but it has an issue that will always be unsolvable: how do you know that the pre-calculation will halt? This is called the halting problem and it has been proven impossible to solve.
The idea was that the compiler would run the brainfuck program up until the first input operation — if there even was one. It would record all output and the final state of the memory. Instead of compiling the code was was run, it would compile code that would print all of the output and set the memory at the final state.
I has mistakenly assumed that it would be possible to detect a non-halting program and avoid doing pre-calculation on it. I described how it would be done and left it at that. Recently, someone kindly sent me an email containing only 5 letters:
+[--]
This defeated my ill-conceived idea.
Because brainfuck is Turing complete, it is actually impossible to determine whether or not an arbitrary brainfuck loop will halt. A computer can't do it. A human brain (a fancy computer) can't do it either. It cannot be done, at least not in this universe.
So, if implemented, this pre-calculation measure will always be flawed.
]]>So you have some data from experimentation or from a function that is difficult to solve.
Suppose you want to estimate a polynomial curve to that fits the data. Then you could interpolate values between the data points. Let p(x) be the polynomial. The equation for the polynomial we will fit to the data will look like this,
The a are the coefficients in our polynomial. We know x and we want to satisfy the condition,
which, when we want to solve it will take the form,
Where the a’s are our unknowns for which we are solving. Notice something? This is the linear system,
We just have to solve for the a vector to get our coefficients. I quickly wrote this GNU Octave code to try this out,
function a = npoly (x, y)
X = repmat (x', 1, length(x));
for i = 1:length(x)
X(:,i) = X(:,i) .^ (i - 1);
end
a = X \ y';
end
This is just an extremely simple and slow version of Octave’s
polyfit
function, except for the order of the coefficients solution
vector. I also wrote this function that will take the coefficient
vector and a value and do the polynomial interpolation at that point
(Octave’s polyval
),
function v = psolve (x, a)
v = zeros (size (x));
for i = 1:length(a)
v = v + a(i) * x.^(i-1);
end
end
Here is an example of my polynomial interpolation function recognizing a parabola,
octave:88> x = 0:.5:3;
octave:89> y = x.^2;
octave:90> a = npoly(x, y)
a =
0.00000
-0.00000
1.00000
-0.00000
0.00000
-0.00000
0.00000
See how only the quadratic component is left because the zeros cancel out everything else? Here is an example with some added Gaussian noise, imitating data that might be pulled from a scientific experiment,
octave:142> x = 0:.5:3;
octave:143> y = x.^2 + randn(size(x))*0.5;
octave:144> a = npoly(x, y);
octave:145> plot(x, y, "b*");
octave:146> hold on
octave:147> x_test = 0:.05:3;
octave:148> y_test = psolve(x_test, a);
octave:149> plot (x_test, y_test, "r")
You can see that the order of our polynomial is too high for the data we are using. The main problem, however, it that the linear system is ill-conditioned. The condition number of the generated X matrix above is 151900, meaning small changes in x result in large changes in the solution. If we step out a bit you can see the polynomial quickly diverge from the given data,
So, I definitely wouldn’t use this for extrapolation.
]]>