<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged interactive at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/interactive/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/interactive/feed/"/>
  <updated>2026-04-07T16:02:31Z</updated>
  <id>urn:uuid:82f38864-6463-4eac-8a86-3eca11fc4fa5</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>When Parallel: Pull, Don't Push</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/04/30/"/>
    <id>urn:uuid:ac12ef1d-299f-4edb-9eb1-5ed4dac1219c</id>
    <updated>2020-04-30T22:35:51Z</updated>
    <category term="optimization"/><category term="interactive"/><category term="javascript"/><category term="opengl"/><category term="media"/><category term="webgl"/><category term="c"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=23089729">on Hacker News</a>.</em></p>

<p>I’ve noticed a small pattern across a few of my projects where I had
vectorized and parallelized some code. The original algorithm had a
“push” approach, the optimized version instead took a “pull” approach.
In this article I’ll describe what I mean, though it’s mostly just so I
can show off some pretty videos, pictures, and demos.</p>

<!--more-->

<h3 id="sandpiles">Sandpiles</h3>

<p>A good place to start is the <a href="https://en.wikipedia.org/wiki/Abelian_sandpile_model">Abelian sandpile model</a>, which, like
many before me, completely <a href="https://xkcd.com/356/">captured</a> my attention for awhile.
It’s a cellular automaton where each cell is a pile of grains of sand —
a sandpile. At each step, any sandpile with more than four grains of
sand spill one grain into its four 4-connected neighbors, regardless of
the number of grains in those neighboring cell. Cells at the edge spill
their grains into oblivion, and those grains no longer exist.</p>

<p>With excess sand falling over the edge, the model eventually hits a
stable state where all piles have three or fewer grains. However, until
it reaches stability, all sorts of interesting patterns ripple though
the cellular automaton. In certain cases, the final pattern itself is
beautiful and interesting.</p>

<p>Numberphile has a great video describing how to <a href="https://www.youtube.com/watch?v=1MtEUErz7Gg">form a group over
recurrent configurations</a> (<a href="https://www.youtube.com/watch?v=hBdJB-BzudU">also</a>). In short, for any given grid
size, there’s a stable <em>identity</em> configuration that, when “added” to
any other element in the group will stabilize back to that element. The
identity configuration is a fractal itself, and has been a focus of
study on its own.</p>

<p>Computing the identity configuration is really just about running the
simulation to completion a couple times from certain starting
configurations. Here’s an animation of the process for computing the
64x64 identity configuration:</p>

<p><a href="https://nullprogram.com/video/?v=sandpiles-64"><img src="/img/identity-64-thumb.png" alt="" /></a></p>

<p>As a fractal, the larger the grid, the more self-similar patterns there
are to observe. There are lots of samples online, and the biggest I
could find was <a href="https://commons.wikimedia.org/wiki/File:Sandpile_group_identity_on_3000x3000_grid.png">this 3000x3000 on Wikimedia Commons</a>. But I wanted
to see one <em>that’s even bigger, damnit</em>! So, skipping to the end, I
eventually computed this 10000x10000 identity configuration:</p>

<p><a href="/img/identity-10000.png"><img src="/img/identity-10000-thumb.png" alt="" /></a></p>

<p>This took 10 days to compute using my optimized implementation:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/animation/sandpiles.c">https://github.com/skeeto/scratch/blob/master/animation/sandpiles.c</a></p>

<p>I picked an algorithm described <a href="https://codegolf.stackexchange.com/a/106990">in a code golf challenge</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f(ones(n)*6 - f(ones(n)*6))
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">f()</code> is the function that runs the simulation to a stable state.</p>

<p>I used <a href="/blog/2015/07/10/">OpenMP to parallelize across cores, and SIMD to parallelize
within a thread</a>. Each thread operates on 32 sandpiles at a time.
To compute the identity sandpile, each sandpile only needs 3 bits of
state, so this could potentially be increased to 85 sandpiles at a time
on the same hardware. The output format is my old mainstay, Netpbm,
<a href="/blog/2017/11/03/">including the video output</a>.</p>

<h4 id="sandpile-push-and-pull">Sandpile push and pull</h4>

<p>So, what do I mean about pushing and pulling? The naive approach to
simulating sandpiles looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each i in sandpiles {
    if input[i] &lt; 4 {
        output[i] = input[i]
    } else {
        output[i] = input[i] - 4
        for each j in neighbors {
            output[j] = output[j] + 1
        }
    }
}
</code></pre></div></div>

<p>As the algorithm examines each cell, it <em>pushes</em> results into
neighboring cells. If we’re using concurrency, that means multiple
threads of execution may be mutating the same cell, which requires
synchronization — locks, <a href="/blog/2014/09/02/">atomics</a>, etc. That much
synchronization is the death knell of performance. The threads will
spend all their time contending for the same resources, even if it’s
just false sharing.</p>

<p>The solution is to <em>pull</em> grains from neighbors:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each i in sandpiles {
    if input[i] &lt; 4 {
        output[i] = input[i]
    } else {
        output[i] = input[i] - 4
    }
    for each j in neighbors {
        if input[j] &gt;= 4 {
            output[i] = output[i] + 1
        }
    }
}
</code></pre></div></div>

<p>Each thread only modifies one cell — the cell it’s in charge of updating
— so no synchronization is necessary. It’s shader-friendly and should
sound familiar if you’ve seen <a href="/blog/2014/06/10/">my WebGL implementation of Conway’s Game
of Life</a>. It’s essentially the same algorithm. If you chase down
the various Abelian sandpile references online, you’ll eventually come
across a 2017 paper by Cameron Fish about <a href="http://people.reed.edu/~davidp/homepage/students/fish.pdf">running sandpile simulations
on GPUs</a>. He cites my WebGL Game of Life article, bringing
everything full circle. We had spoken by email at the time, and he
<a href="https://people.reed.edu/~davidp/web_sandpiles/">shared his <strong>interactive simulation</strong> with me</a>.</p>

<p>Vectorizing this algorithm is straightforward: Load multiple piles at
once, one per SIMD channel, and use masks to implement the branches. In
my code I’ve also unrolled the loop. To avoid bounds checking in the
SIMD code, I pad the state data structure with zeros so that the edge
cells have static neighbors and are no longer special.</p>

<h3 id="webgl-fire">WebGL Fire</h3>

<p>Back in the old days, one of the <a href="http://fabiensanglard.net/doom_fire_psx/">cool graphics tricks was fire
animations</a>. It was so easy to implement on limited hardware. In
fact, the most obvious way to compute it was directly in the
framebuffer, such as in <a href="/blog/2014/12/09/">the VGA buffer</a>, with no outside state.</p>

<p>There’s a heat source at the bottom of the screen, and the algorithm
runs from bottom up, propagating that heat upwards randomly. Here’s the
algorithm using traditional screen coordinates (top-left corner origin):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func rand(min, max) // random integer in [min, max]

for each x, y from bottom {
    buf[y-1][x+rand(-1, 1)] = buf[y][x] - rand(0, 1)
}
</code></pre></div></div>

<p>As a <em>push</em> algorithm it works fine with a single-thread, but
it doesn’t translate well to modern video hardware. So convert it to a
<em>pull</em> algorithm!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each x, y {
    sx = x + rand(-1, 1)
    sy = y + rand(1, 2)
    output[y][x] = input[sy][sx] - rand(0, 1)
}
</code></pre></div></div>

<p>Cells pull the fire upward from the bottom. Though this time there’s a
catch: <em>This algorithm will have subtly different results.</em></p>

<ul>
  <li>
    <p>In the original, there’s a single state buffer and so a flame could
propagate upwards multiple times in a single pass. I’ve compensated
here by allowing a flames to propagate further at once.</p>
  </li>
  <li>
    <p>In the original, a flame only propagates to one other cell. In this
version, two cells might pull from the same flame, cloning it.</p>
  </li>
</ul>

<p>In the end it’s hard to tell the difference, so this works out.</p>

<p><a href="https://nullprogram.com/webgl-fire/"><img src="/img/fire-thumb.png" alt="" /></a></p>

<p><a href="https://github.com/skeeto/webgl-fire/">source code and instructions</a></p>

<p>There’s still potentially contention in that <code class="language-plaintext highlighter-rouge">rand()</code> function, but this
can be resolved <a href="https://www.shadertoy.com/view/WttXWX">with a hash function</a> that takes <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> as
inputs.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Two Chaotic Motion Demos</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/15/"/>
    <id>urn:uuid:5b76d549-b253-355b-391b-cfdc25d2b056</id>
    <updated>2018-02-15T04:18:07Z</updated>
    <category term="javascript"/><category term="interactive"/><category term="webgl"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>I’ve put together two online, interactive, demonstrations of <a href="https://en.wikipedia.org/wiki/Chaos_theory">chaotic
motion</a>. One is 2D and the other is 3D, but both are rendered
using <a href="/blog/2013/06/10/">WebGL</a> — which, for me, is the most interesting part.
Both are governed by ordinary differential equations. Both are
integrated using the <a href="https://en.wikipedia.org/wiki/Runge–Kutta_methods">Runge–Kutta method</a>, specifically RK4.</p>

<p>Far more knowledgeable people have already written introductions for
chaos theory, so here’s just a quick summary. A chaotic system is
deterministic but highly sensitive to initial conditions. Tweaking a
single bit of the starting state of either of my demos will quickly
lead to two arbitrarily different results. Both demonstrations have
features that aim to show this in action.</p>

<p>This ain’t my first chaotic system rodeo. About eight years ago I made
<a href="/blog/2010/10/16/">water wheel Java applet</a>, and that was based on some Matlab code I
collaborated on some eleven years ago. I really hope you’re not equipped
to run a crusty old Java applet in 2018, though. (<strong>Update</strong>: <a href="https://github.com/skeeto/waterwheel">now
upgraded to HTML5 Canvas</a>.)</p>

<p>If you want to find either of these demos again in the future, you
don’t need to find this article first. They’re both listed in my
<a href="/toys/">Showcase page</a>, linked from the header of this site.</p>

<h3 id="double-pendulum">Double pendulum</h3>

<p>First up is the classic <a href="https://en.wikipedia.org/wiki/Double_pendulum">double pendulum</a>. This one’s more intuitive
than my other demo since it’s modeling a physical system you could
actually build and observe in the real world.</p>

<p><a href="/double-pendulum/"><img src="/img/screenshot/double-pendulum.png" alt="" /></a></p>

<p>Source: <a href="https://github.com/skeeto/double-pendulum">https://github.com/skeeto/double-pendulum</a></p>

<p>I lifted the differential equations straight from the Wikipedia article
(<code class="language-plaintext highlighter-rouge">derivative()</code> in my code). Same for the Runge–Kutta method (<code class="language-plaintext highlighter-rouge">rk4()</code> in
my code). It’s all pretty straightforward. RK4 may not have been the
best choice for this system since it seems to bleed off energy over
time. If you let my demo run over night, by morning there will obviously
be a lot less activity.</p>

<p>I’m not a fan of buttons and other fancy GUI widgets — neither
designing them nor using them myself — prefering more cryptic, but
easy-to-use keyboard-driven interfaces. (Hey, it works well for
<a href="https://mpv.io/">mpv</a> and <a href="http://www.mplayerhq.hu/design7/news.html">MPlayer</a>.) I haven’t bothered with a mobile
interface, so sorry if you’re reading on your phone. You’ll just have
to enjoy watching a single pendulum.</p>

<p>Here are the controls:</p>

<ul>
  <li><kbd>a</kbd>: add a new random pendulum</li>
  <li><kbd>c</kbd>: imperfectly clone an existing pendulum</li>
  <li><kbd>d</kbd>: delete the most recently added pendulum</li>
  <li><kbd>m</kbd>: toggle between WebGL and Canvas rendering</li>
  <li><kbd>SPACE</kbd>: pause the simulation (toggle)</li>
</ul>

<p>To witness chaos theory in action:</p>

<ol>
  <li>Start with a single pendulum (the default).</li>
  <li>Pause the simulation (<kbd>SPACE</kbd>).</li>
  <li>Make a dozen or so clones (press <kbd>c</kbd> <a href="https://www.youtube.com/watch?v=Uk0mJSTatbw">for awhile</a>).</li>
  <li>Unpause.</li>
</ol>

<p>At first it will appear as single pendulum, but they’re actually all
stacked up, each starting from slightly randomized initial conditions.
Within a minute you’ll witness the pendulums diverge, and after a minute
they’ll all be completely different. It’s pretty to watch them come
apart at first.</p>

<p>It might appear that the <kbd>m</kbd> key doesn’t actually do
anything. That’s because the HTML5 Canvas rendering — which is what I
actually implemented first — is <em>really</em> close to the WebGL rendering.
I’m really proud of this. There are just three noticeable differences.
First, there’s a rounded line cap in the Canvas rendering where the
pendulum is “attached.” Second, the tail line segments aren’t properly
connected in the Canvas rendering. The segments are stroked separately
in order to get that gradient effect along its path. Third, it’s a lot
slower, particularly when there are many pendulums to render.</p>

<p><img src="/img/screenshot/canvas-indicators.png" alt="" /></p>

<p>In WebGL the two “masses” are rendered using that <a href="/blog/2017/11/03/#dot-rendering">handy old circle
rasterization technique</a> on a quad. Either a <a href="/blog/2014/06/01/">triangle fan</a>
or pre-rendering the circle as a texture would probably have been a
better choices. The two bars are the same quad buffers, just squeezed
and rotated into place. Both were really simple to create. It’s the
tail that was tricky to render.</p>

<p>When I wrote the original Canvas renderer, I set the super convenient
<code class="language-plaintext highlighter-rouge">lineWidth</code> property to get a nice, thick tail. In my first cut at
rendering the tail I used <code class="language-plaintext highlighter-rouge">GL_LINE_STRIP</code> to draw a line primitive.
The problem with the line primitive is that an OpenGL implementation
is only required to support single pixel wide lines. If I wanted
wider, I’d have to generate the geometry myself. So I did.</p>

<p>Like before, I wasn’t about to dirty my hands manipulating a
graphite-filled wooden stick on a piece of paper to solve this
problem. No, I lifted the math from something I found on the internet
again. In this case it was <a href="https://forum.libcinder.org/topic/smooth-thick-lines-using-geometry-shader#23286000001269127">a forum post by paul.houx</a>, which
provides a few vector equations to compute a triangle strip from a
line strip. My own modification was to add a miter limit, which keeps
sharp turns under control. You can find my implementation in
<code class="language-plaintext highlighter-rouge">polyline()</code> in my code. Here’s a close-up with the skeleton rendered
on top in black:</p>

<p><img src="/img/screenshot/tail-mesh.png" alt="" /></p>

<p>For the first time I’m also using ECMAScript’s new <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals">template
literals</a> to store the shaders inside the JavaScript source. These
string literals can contain newlines, but, even cooler, I it does
string interpolation, meaning I can embed JavaScript variables
directly into the shader code:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">let</span> <span class="nx">massRadius</span> <span class="o">=</span> <span class="mf">0.12</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">vertexShader</span> <span class="o">=</span> <span class="s2">`
attribute vec2 a_point;
uniform   vec2 u_center;
varying   vec2 v_point;

void main() {
    v_point = a_point;
    gl_Position = vec4(a_point * </span><span class="p">${</span><span class="nx">massRadius</span><span class="p">}</span><span class="s2"> + u_center, 0, 1);
}`</span><span class="p">;</span>
</code></pre></div></div>

<h4 id="allocation-avoidance">Allocation avoidance</h4>

<p>If you’ve looked at my code you might have noticed something curious.
I’m using a lot of <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment">destructuring assignments</a>, which is another
relatively new addition to ECMAScript. This was part of a little
experiment.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">normalize</span><span class="p">(</span><span class="nx">v0</span><span class="p">,</span> <span class="nx">v1</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">d</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nx">v0</span> <span class="o">*</span> <span class="nx">v0</span> <span class="o">+</span> <span class="nx">v1</span> <span class="o">*</span> <span class="nx">v1</span><span class="p">);</span>
    <span class="k">return</span> <span class="p">[</span><span class="nx">v0</span> <span class="o">/</span> <span class="nx">d</span><span class="p">,</span> <span class="nx">v1</span> <span class="o">/</span> <span class="nx">d</span><span class="p">];</span>
<span class="p">}</span>

<span class="cm">/* ... */</span>

<span class="kd">let</span> <span class="p">[</span><span class="nx">nx</span><span class="p">,</span> <span class="nx">ny</span><span class="p">]</span> <span class="o">=</span> <span class="nx">normalize</span><span class="p">(</span><span class="o">-</span><span class="nx">ly</span><span class="p">,</span> <span class="nx">lx</span><span class="p">);</span>
</code></pre></div></div>

<p>One of my goals for this project was <strong>zero heap allocations in the
main WebGL rendering loop</strong>. There are <a href="https://i.imgur.com/ceqSpHg.jpg">no garbage collector hiccups
if there’s no garbage to collect</a>. This sort of thing is trivial
in a language with manual memory management, such as C and C++. Just
having value semantics for aggregates would be sufficient.</p>

<p>But with JavaScript I don’t get to choose how my objects are allocated.
I either have to pre-allocate everything, including space for all the
intermediate values (e.g. an object pool). This would be clunky and
unconventional. Or I can structure and access my allocations in such a
way that the JIT compiler can eliminate them (via escape analysis,
scalar replacement, etc.).</p>

<p>In this case, I’m trusting that JavaScript implementations will
flatten these destructuring assignments so that the intermediate array
never actually exists. It’s like pretending the array has value
semantics. This seems to work as I expect with V8, but not so well
with SpiderMonkey (yet?), at least in Firefox 52 ESR.</p>

<h4 id="single-precision">Single precision</h4>

<p>I briefly considered using <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/fround"><code class="language-plaintext highlighter-rouge">Math.fround()</code></a> to convince
JavaScript to compute all the tail geometry in single precision. The
double pendulum system would remain double precision, but the geometry
doesn’t need all that precision. It’s all rounded to single precision
going out to the GPU anyway.</p>

<p>Normally when pulling values from a <code class="language-plaintext highlighter-rouge">Float32Array</code>, they’re cast to
double precision — JavaScript’s only numeric type — and all operations
are performed in double precision, even if the result is stored back
in a <code class="language-plaintext highlighter-rouge">Float32Array</code>. This is because the JIT compiler is required to
correctly perform all the <a href="https://possiblywrong.wordpress.com/2017/09/12/floating-point-agreement-between-matlab-and-c/">intermediate rounding</a>. To relax this
requirement, <a href="https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/">surround each operation with a call to
<code class="language-plaintext highlighter-rouge">Math.fround()</code></a>. Since the result of doing each operation in
double precision with this rounding step in between is equivalent to
doing each operation in single precision, the JIT compiler can choose
to do the latter.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">let</span> <span class="nx">x</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Float32Array</span><span class="p">(</span><span class="nx">n</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">y</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Float32Array</span><span class="p">(</span><span class="nx">n</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">d</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Float32Array</span><span class="p">(</span><span class="nx">n</span><span class="p">);</span>
<span class="c1">// ...</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">n</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">xprod</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">fround</span><span class="p">(</span><span class="nx">x</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">*</span> <span class="nx">x</span><span class="p">[</span><span class="nx">i</span><span class="p">]);</span>
    <span class="kd">let</span> <span class="nx">yprod</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">fround</span><span class="p">(</span><span class="nx">y</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">*</span> <span class="nx">y</span><span class="p">[</span><span class="nx">i</span><span class="p">]);</span>
    <span class="nx">d</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">fround</span><span class="p">(</span><span class="nx">xprod</span> <span class="o">+</span> <span class="nx">yprod</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I ultimately decided not to bother with this since it would
significantly obscures my code for what is probably a minuscule
performance gain (in this case). It’s also really difficult to tell if
I did it all correctly. So I figure this is better suited for
compilers that target JavaScript rather than something to do by hand.</p>

<h3 id="lorenz-system">Lorenz system</h3>

<p>The other demo is a <a href="https://en.wikipedia.org/wiki/Lorenz_system">Lorenz system</a> with its famous butterfly
pattern. I actually wrote this one a year and a half ago but never got
around to writing about it. You can tell it’s older because I’m still
using <code class="language-plaintext highlighter-rouge">var</code>.</p>

<p><a href="/lorenz-webgl/"><img src="/img/screenshot/lorenz-webgl.png" alt="" /></a></p>

<p>Source: <a href="https://github.com/skeeto/lorenz-webgl">https://github.com/skeeto/lorenz-webgl</a></p>

<p>Like before, the equations came straight from the Wikipedia article
(<code class="language-plaintext highlighter-rouge">Lorenz.lorenz()</code> in my code). They math is a lot simpler this time,
too.</p>

<p>This one’s a bit more user friendly with a side menu displaying all
your options. The keys are basically the same. This was completely by
accident, I swear. Here are the important ones:</p>

<ul>
  <li><kbd>a</kbd>: add a new random solution</li>
  <li><kbd>c</kbd>: clone a solution with a perturbation</li>
  <li><kbd>C</kbd>: remove all solutions</li>
  <li><kbd>SPACE</kbd>: toggle pause/unpause</li>
  <li>You can click, drag, and toss it to examine it in 3D</li>
</ul>

<p>Witnessing chaos theory in action is the same process as before: clear
it down to a single solution (<kbd>C</kbd> then <kbd>a</kbd>), then add
a bunch of randomized clones (<kbd>c</kbd>).</p>

<p>There is no Canvas renderer for this one. It’s pure WebGL. The tails are
drawn using <code class="language-plaintext highlighter-rouge">GL_LINE_STRIP</code>, but in this case it works fine that they’re
a single pixel wide. If heads are turned on, those are just <code class="language-plaintext highlighter-rouge">GL_POINT</code>.
The geometry is threadbare for this one.</p>

<p>There is one notable feature: <strong>The tails are stored exclusively in
GPU memory</strong>. Only the “head” is stored CPU-side. After it computes
the next step, it updates a single spot of the tail with
<code class="language-plaintext highlighter-rouge">glBufferSubData()</code>, and the VBO is actually a circular buffer. OpenGL
doesn’t directly support rendering from circular buffers, but it
<em>does</em> have element arrays. An element array is an additional buffer
of indices that tells OpenGL what order to use the elements in the
other buffers.</p>

<p>Naively would mean for a tail of 4 segments, I need 4 different
element arrays, one for each possible rotation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array 0: 0 1 2 3
array 1: 1 2 3 0
array 2: 2 3 0 1
array 3: 3 0 1 2
</code></pre></div></div>

<p>With the knowledge that element arrays can start at an offset, and
with a little cleverness, you might notice these can all overlap in a
single, 7-element array:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 1 2 3 0 1 2
</code></pre></div></div>

<p>Array 0 is at offset 0, array 1 is at offset 1, array 2 is at offset 2,
and array 3 is at offset 3. The tails in the Lorenz system are drawn
using <code class="language-plaintext highlighter-rouge">drawElements()</code> with exactly this sort of array.</p>

<p>Like before, I was very careful to produce zero heap allocations in the
main loop. The FPS counter generates some garbage in the DOM due to
reflow, but this goes away if you hide the help menu (<kbd>?</kbd>). This
was long enough ago that destructuring assignment wasn’t available, but
Lorenz system and rendering it were so simple that using pre-allocated
objects worked fine.</p>

<p>Beyond just the programming, I’ve gotten hours of entertainment
playing with each of these systems. This was also the first time I’ve
used WebGL in over a year, and this project was a reminder of just how
working with it is so pleasurable. <a href="https://www.khronos.org/registry/webgl/specs/1.0/">The specification</a> is
superbly written and serves perfectly as its own reference.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Autotetris Mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/10/19/"/>
    <id>urn:uuid:e76556be-ebeb-3f65-7041-bffbe2e19952</id>
    <updated>2014-10-19T21:45:53Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p>For more than a decade now, Emacs has come with a built-in Tetris
clone, originally written by XEmacs’ Glynn Clements. Just run <code class="language-plaintext highlighter-rouge">M-x
tetris</code> any time you want to play. For anyone too busy to waste time
playing Tetris, earlier this year I wrote an autotetris-mode that will
play the Emacs game automatically.</p>

<ul>
  <li><a href="https://github.com/skeeto/autotetris-mode">https://github.com/skeeto/autotetris-mode</a></li>
</ul>

<p>Load the source, <code class="language-plaintext highlighter-rouge">autotetris-mode.el</code> and <code class="language-plaintext highlighter-rouge">M-x autotetris</code>. It will
start the built-in Tetris but make all the moves itself. It works best
when byte compiled.</p>

<p><img src="/img/diagram/tetris/screenshot.png" alt="" /></p>

<p>At the time I had read <a href="http://www.cs.cornell.edu/boom/1999sp/projects/tetris/">an article</a> and was interested in trying
my hand at my own Tetris AI. Like most things Emacs, the built-in
Tetris game is very hackable. It’s also pretty simple and easy to
understand. Rather than write my own I chose to build upon this one.</p>

<h3 id="heuristics">Heuristics</h3>

<p>It’s not a particularly strong AI. It doesn’t pay attention to the
next piece in queue, it doesn’t know the game’s basic shapes, and it
doesn’t try to maximize the score (clearing multiple rows at once).
The goal is to continue running for as long as possible. But since
it’s able to get to the point where the game is so fast that the AI is
unable to move pieces fast enough (it’s rate limited like a human
player), that means it’s good enough.</p>

<p>When a new piece appears at the top of the screen, the AI, in memory,
tries placing it in all possible positions and all possible
orientations. For each of these positions it runs a heuristic on the
resulting game state, summing five metrics. Each metric is scaled by a
hand-tuned weight to adjust its relative priority. Smaller is better,
so the position with the lowest score is selected.</p>

<h4 id="number-of-holes">Number of Holes</h4>

<p><img src="/img/diagram/tetris/holes.png" alt="" /></p>

<p>A hole is any open space that has a solid block above it, even if that
hole is accessible without passing through a solid block. Count these
holes.</p>

<h4 id="maximum-height">Maximum Height</h4>

<p><img src="/img/diagram/tetris/height.png" alt="" /></p>

<p>Add the height of the tallest column. Column height includes any holes
in the column. The game ends when a column touches the top of the
screen (or something like that), so this should be kept in check.</p>

<h4 id="mean-height">Mean Height</h4>

<p><img src="/img/diagram/tetris/mean.png" alt="" /></p>

<p>Add the mean height of all columns. The higher this is, the closer we
are to losing the game. Since each row will have at least one hole,
this will be a similar measure to the hole count.</p>

<h4 id="height-disparity">Height Disparity</h4>

<p><img src="/img/diagram/tetris/disparity.png" alt="" /></p>

<p>Add the difference between the shortest column height and the tallest
column height. If this number is large it means we’re not making
effective use of the playing area. It also discourages the AI from
getting into that annoying situation we all remember: when you
<em>really</em> need a 4x1 piece that never seems to come. Those are the
brief moments when I truly believe the version I’m playing has to be
rigged.</p>

<h4 id="surface-roughness">Surface Roughness</h4>

<p><img src="/img/diagram/tetris/surface.png" alt="" /></p>

<p>Take the root mean square of the column heights. A rougher surface
leaves fewer options when placing pieces. This measure will be similar
to the disparity measurement.</p>

<h3 id="emacs-specific-details">Emacs-specific Details</h3>

<p>With a position selected, the AI sends player inputs at a limited rate
to the game itself, moving the piece into place. This is done by
calling <code class="language-plaintext highlighter-rouge">tetris-move-right</code>, <code class="language-plaintext highlighter-rouge">tetris-move-left</code>, and
<code class="language-plaintext highlighter-rouge">tetris-rotate-next</code>, which, in the normal game, are bound to the
arrow keys.</p>

<p>The built-in tetris-mode isn’t quite designed for this kind of
extension, so it needs a little bit of help. I defined two pieces of
advice to create hooks. These hooks alert my AI to two specific events
in the game: the game start and a fresh, new piece.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-new-shape</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-new-shape-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-new-shape-hook</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-start-game</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-start-game-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-start-game-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>I talked before about <a href="/blog/2014/10/12/">the problems with global state</a>.
Fortunately, tetris-mode doesn’t store any game state in global
variables. It stores everything in buffer-local variables, which can
be exploited for use in the AI. To perform the “in memory” heuristic
checks, it creates a copy of the game state and manipulates the copy.
The copy is made by way of <code class="language-plaintext highlighter-rouge">clone-buffer</code> on the <code class="language-plaintext highlighter-rouge">*Tetris*</code> buffer.
The tetris-mode functions all work equally as well on the clone, so I
can use the existing game rules to properly place the next piece in
each available position. The game’s own rules take care of clearing
rows and checking for collisions for me. I wrote an
<code class="language-plaintext highlighter-rouge">autotetris-save-excursion</code> function to handle the messy details.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">autotetris-save-excursion</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="s">"Restore tetris game state after BODY completes."</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">tetris-buffer-name</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">autotetris-saved</span> <span class="p">(</span><span class="nv">clone-buffer</span> <span class="s">"*Tetris-saved*"</span><span class="p">)))</span>
       <span class="p">(</span><span class="k">unwind-protect</span>
           <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">autotetris-saved</span>
             <span class="p">(</span><span class="nv">kill-local-variable</span> <span class="ss">'kill-buffer-hook</span><span class="p">)</span>
             <span class="o">,@</span><span class="nv">body</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">kill-buffer</span> <span class="nv">autotetris-saved</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">kill-buffer-hook</code> variable is also cloned, but I don’t want
tetris-mode to respond to the clone being killed, so I clear out the
hook.</p>

<p>That’s basically all there is to it! While watching it feels like it’s
making dumb mistakes, not placing pieces in optimal positions, but it
recovers well from these situations almost every time, so it must know
what it’s doing. Currently it’s a better player than me, which is <a href="/blog/2011/08/24/">my
rule-of-thumb</a> for calling an AI successful.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A GPU Approach to Particle Physics</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/29/"/>
    <id>urn:uuid:2d2ab14c-18c6-3968-d9b1-5243e7d0b2f1</id>
    <updated>2014-06-29T03:23:42Z</updated>
    <category term="webgl"/><category term="media"/><category term="interactive"/><category term="gpgpu"/><category term="javascript"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>The next project in my <a href="/tags/gpgpu/">GPGPU series</a> is a particle physics
engine that computes the entire physics simulation on the GPU.
Particles are influenced by gravity and will bounce off scene
geometry. This WebGL demo uses a shader feature not strictly required
by the OpenGL ES 2.0 specification, so it may not work on some
platforms, especially mobile devices. It will be discussed later in
the article.</p>

<ul>
  <li><a href="https://skeeto.github.io/webgl-particles/">https://skeeto.github.io/webgl-particles/</a> (<a href="https://github.com/skeeto/webgl-particles">source</a>)</li>
</ul>

<p>It’s interactive. The mouse cursor is a circular obstacle that the
particles bounce off of, and clicking will place a permanent obstacle
in the simulation. You can paint and draw structures through which the
the particles will flow.</p>

<p>Here’s an HTML5 video of the demo in action, which, out of necessity,
is recorded at 60 frames-per-second and a high bitrate, so it’s pretty
big. Video codecs don’t gracefully handle all these full-screen
particles very well and lower framerates really don’t capture the
effect properly. I also added some appropriate sound that you won’t
hear in the actual demo.</p>

<video width="500" height="375" controls="" poster="/img/particles/poster.png" preload="none">
  <source src="https://nullprogram.s3.amazonaws.com/particles/particles.webm" type="video/webm" />
  <source src="https://nullprogram.s3.amazonaws.com/particles/particles.mp4" type="video/mp4" />
  <img src="/img/particles/poster.png" width="500" height="375" />
</video>

<p>On a modern GPU, it can simulate <em>and</em> draw over 4 million particles
at 60 frames per second. Keep in mind that this is a JavaScript
application, I haven’t really spent time optimizing the shaders, and
it’s living within the constraints of WebGL rather than something more
suitable for general computation, like OpenCL or at least desktop
OpenGL.</p>

<h3 id="encoding-particle-state-as-color">Encoding Particle State as Color</h3>

<p>Just as with the <a href="/blog/2014/06/10/">Game of Life</a> and <a href="/blog/2014/06/22/">path finding</a>
projects, simulation state is stored in pairs of textures and the
majority of the work is done by a fragment shader mapped between them
pixel-to-pixel. I won’t repeat myself with the details of setting this
up, so refer to the Game of Life article if you need to see how it
works.</p>

<p>For this simulation, there are four of these textures instead of two:
a pair of position textures and a pair of velocity textures. Why pairs
of textures? There are 4 channels, so every one of these components
(x, y, dx, dy) could be packed into its own color channel. This seems
like the simplest solution.</p>

<p><img src="/img/particles/pack-tight.png" alt="" /></p>

<p>The problem with this scheme is the lack of precision. With the
R8G8B8A8 internal texture format, each channel is one byte. That’s 256
total possible values. The display area is 800 by 600 pixels, so not
even every position on the display would be possible. Fortunately, two
bytes, for a total of 65,536 values, is plenty for our purposes.</p>

<p><img src="/img/particles/position-pack.png" alt="" />
<img src="/img/particles/velocity-pack.png" alt="" /></p>

<p>The next problem is how to encode values across these two channels. It
needs to cover negative values (negative velocity) and it should try
to take full advantage of dynamic range, i.e. try to spread usage
across all of those 65,536 values.</p>

<p>To encode a value, multiply the value by a scalar to stretch it over
the encoding’s dynamic range. The scalar is selected so that the
required highest values (the dimensions of the display) are the
highest values of the encoding.</p>

<p>Next, add half the dynamic range to the scaled value. This converts
all negative values into positive values with 0 representing the
lowest value. This representation is called <a href="http://en.wikipedia.org/wiki/Signed_number_representations#Excess-K">Excess-K</a>. The
downside to this is that clearing the texture (<code class="language-plaintext highlighter-rouge">glClearColor</code>) with
transparent black no longer sets the decoded values to 0.</p>

<p>Finally, treat each channel as a digit of a base-256 number. The
OpenGL ES 2.0 shader language has no bitwise operators, so this is
done with plain old division and modulus. I made an encoder and
decoder in both JavaScript and GLSL. JavaScript needs it to write the
initial values and, for debugging purposes, so that it can read back
particle positions.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">vec2</span> <span class="nf">encode</span><span class="p">(</span><span class="kt">float</span> <span class="n">value</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="o">*</span> <span class="n">scale</span> <span class="o">+</span> <span class="n">OFFSET</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">x</span> <span class="o">=</span> <span class="n">mod</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">BASE</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">y</span> <span class="o">=</span> <span class="n">floor</span><span class="p">(</span><span class="n">value</span> <span class="o">/</span> <span class="n">BASE</span><span class="p">);</span>
    <span class="k">return</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">/</span> <span class="n">BASE</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="nf">decode</span><span class="p">(</span><span class="kt">vec2</span> <span class="n">channels</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">channels</span><span class="p">,</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">BASE</span><span class="p">,</span> <span class="n">BASE</span> <span class="o">*</span> <span class="n">BASE</span><span class="p">))</span> <span class="o">-</span> <span class="n">OFFSET</span><span class="p">)</span> <span class="o">/</span> <span class="n">scale</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And JavaScript. Unlike normalized GLSL values above (0.0-1.0), this
produces one-byte integers (0-255) for packing into typed arrays.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">encode</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span> <span class="nx">scale</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">b</span> <span class="o">=</span> <span class="nx">Particles</span><span class="p">.</span><span class="nx">BASE</span><span class="p">;</span>
    <span class="nx">value</span> <span class="o">=</span> <span class="nx">value</span> <span class="o">*</span> <span class="nx">scale</span> <span class="o">+</span> <span class="nx">b</span> <span class="o">*</span> <span class="nx">b</span> <span class="o">/</span> <span class="mi">2</span><span class="p">;</span>
    <span class="kd">var</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span>
        <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">((</span><span class="nx">value</span> <span class="o">%</span> <span class="nx">b</span><span class="p">)</span> <span class="o">/</span> <span class="nx">b</span> <span class="o">*</span> <span class="mi">255</span><span class="p">),</span>
        <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">value</span> <span class="o">/</span> <span class="nx">b</span><span class="p">)</span> <span class="o">/</span> <span class="nx">b</span> <span class="o">*</span> <span class="mi">255</span><span class="p">)</span>
    <span class="p">];</span>
    <span class="k">return</span> <span class="nx">pair</span><span class="p">;</span>
<span class="p">}</span>

<span class="kd">function</span> <span class="nx">decode</span><span class="p">(</span><span class="nx">pair</span><span class="p">,</span> <span class="nx">scale</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">b</span> <span class="o">=</span> <span class="nx">Particles</span><span class="p">.</span><span class="nx">BASE</span><span class="p">;</span>
    <span class="k">return</span> <span class="p">(((</span><span class="nx">pair</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="mi">255</span><span class="p">)</span> <span class="o">*</span> <span class="nx">b</span> <span class="o">+</span>
             <span class="p">(</span><span class="nx">pair</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="mi">255</span><span class="p">)</span> <span class="o">*</span> <span class="nx">b</span> <span class="o">*</span> <span class="nx">b</span><span class="p">)</span> <span class="o">-</span> <span class="nx">b</span> <span class="o">*</span> <span class="nx">b</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="nx">scale</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The fragment shader that updates each particle samples the position
and velocity textures at that particle’s “index”, decodes their
values, operates on them, then encodes them back into a color for
writing to the output texture. Since I’m using WebGL, which lacks
multiple rendering targets (despite having support for <code class="language-plaintext highlighter-rouge">gl_FragData</code>),
the fragment shader can only output one color. Position is updated in
one pass and velocity in another as two separate draws. The buffers
are not swapped until <em>after</em> both passes are done, so the velocity
shader (intentionally) doesn’t uses the updated position values.</p>

<p>There’s a limit to the maximum texture size, typically 8,192 or 4,096,
so rather than lay the particles out in a one-dimensional texture, the
texture is kept square. Particles are indexed by two-dimensional
coordinates.</p>

<p>It’s pretty interesting to see the position or velocity textures drawn
directly to the screen rather than the normal display. It’s another
domain through which to view the simulation, and it even helped me
identify some issues that were otherwise hard to see. The output is a
shimmering array of color, but with definite patterns, revealing a lot
about the entropy (or lack thereof) of the system. I’d share a video
of it, but it would be even more impractical to encode than the normal
display. Here are screenshots instead: position, then velocity. The
alpha component is not captured here.</p>

<p><img src="/img/particles/position.png" alt="" />
<img src="/img/particles/velocity.png" alt="" /></p>

<h3 id="entropy-conservation">Entropy Conservation</h3>

<p>One of the biggest challenges with running a simulation like this on a
GPU is the lack of random values. There’s no <code class="language-plaintext highlighter-rouge">rand()</code> function in the
shader language, so the whole thing is deterministic by default. All
entropy comes from the initial texture state filled by the CPU. When
particles clump up and match state, perhaps from flowing together over
an obstacle, it can be difficult to work them back apart since the
simulation handles them identically.</p>

<p>To mitigate this problem, the first rule is to conserve entropy
whenever possible. When a particle falls out of the bottom of the
display, it’s “reset” by moving it back to the top. If this is done by
setting the particle’s Y value to 0, then information is destroyed.
This must be avoided! Particles below the bottom edge of the display
tend to have slightly different Y values, despite exiting during the
same iteration. Instead of resetting to 0, a constant value is added:
the height of the display. The Y values remain different, so these
particles are more likely to follow different routes when bumping into
obstacles.</p>

<p>The next technique I used is to supply a single fresh random value via
a uniform for each iteration This value is added to the position and
velocity of reset particles. The same value is used for all particles
for that particular iteration, so this doesn’t help with overlapping
particles, but it does help to break apart “streams”. These are
clearly-visible lines of particles all following the same path. Each
exits the bottom of the display on a different iteration, so the
random value separates them slightly. Ultimately this stirs in a few
bits of fresh entropy into the simulation on each iteration.</p>

<p>Alternatively, a texture containing random values could be supplied to
the shader. The CPU would have to frequently fill and upload the
texture, plus there’s the issue of choosing where to sample the
texture, itself requiring a random value.</p>

<p>Finally, to deal with particles that have exactly overlapped, the
particle’s unique two-dimensional index is scaled and added to the
position and velocity when resetting, teasing them apart. The random
value’s sign is multiplied by the index to avoid bias in any
particular direction.</p>

<p>To see all this in action in the demo, make a big bowl to capture all
the particles, getting them to flow into a single point. This removes
all entropy from the system. Now clear the obstacles. They’ll all fall
down in a single, tight clump. It will still be somewhat clumped when
resetting at the top, but you’ll see them spraying apart a little bit
(particle indexes being added). These will exit the bottom at slightly
different times, so the random value plays its part to work them apart
even more. After a few rounds, the particles should be pretty evenly
spread again.</p>

<p>The last source of entropy is your mouse. When you move it through the
scene you disturb particles and introduce some noise to the
simulation.</p>

<h3 id="textures-as-vertex-attribute-buffers">Textures as Vertex Attribute Buffers</h3>

<p>This project idea occurred to me while reading the <a href="http://www.khronos.org/files/opengles_shading_language.pdf">OpenGL ES shader
language specification</a> (PDF). I’d been wanting to do a particle
system, but I was stuck on the problem how to draw the particles. The
texture data representing positions needs to somehow be fed back into
the pipeline as vertices. Normally a <a href="http://www.opengl.org/wiki/Buffer_Texture">buffer texture</a> — a texture
backed by an array buffer — or a <a href="http://www.opengl.org/wiki/Pixel_Buffer_Object">pixel buffer object</a> —
asynchronous texture data copying — might be used for this, but WebGL
has none these features. Pulling texture data off the GPU and putting
it all back on as an array buffer on each frame is out of the
question.</p>

<p>However, I came up with a cool technique that’s better than both those
anyway. The shader function <code class="language-plaintext highlighter-rouge">texture2D</code> is used to sample a pixel in a
texture. Normally this is used by the fragment shader as part of the
process of computing a color for a pixel. But the shader language
specification mentions that <code class="language-plaintext highlighter-rouge">texture2D</code> is available in vertex
shaders, too. That’s when it hit me. <strong>The vertex shader itself can
perform the conversion from texture to vertices.</strong></p>

<p>It works by passing the previously-mentioned two-dimensional particle
indexes as the vertex attributes, using them to look up particle
positions from within the vertex shader. The shader would run in
<code class="language-plaintext highlighter-rouge">GL_POINTS</code> mode, emitting point sprites. Here’s the abridged version,</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">attribute</span> <span class="kt">vec2</span> <span class="n">index</span><span class="p">;</span>

<span class="k">uniform</span> <span class="kt">sampler2D</span> <span class="n">positions</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">statesize</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">worldsize</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">float</span> <span class="n">size</span><span class="p">;</span>

<span class="c1">// float decode(vec2) { ...</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">vec4</span> <span class="n">psample</span> <span class="o">=</span> <span class="n">texture2D</span><span class="p">(</span><span class="n">positions</span><span class="p">,</span> <span class="n">index</span> <span class="o">/</span> <span class="n">statesize</span><span class="p">);</span>
    <span class="kt">vec2</span> <span class="n">p</span> <span class="o">=</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">decode</span><span class="p">(</span><span class="n">psample</span><span class="p">.</span><span class="n">rg</span><span class="p">),</span> <span class="n">decode</span><span class="p">(</span><span class="n">psample</span><span class="p">.</span><span class="n">ba</span><span class="p">));</span>
    <span class="nb">gl_Position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">p</span> <span class="o">/</span> <span class="n">worldsize</span> <span class="o">*</span> <span class="mi">2</span><span class="p">.</span><span class="mi">0</span> <span class="o">-</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="nb">gl_PointSize</span> <span class="o">=</span> <span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The real version also samples the velocity since it modulates the
color (slow moving particles are lighter than fast moving particles).</p>

<p>However, there’s a catch: implementations are allowed to limit the
number of vertex shader texture bindings to 0
(<code class="language-plaintext highlighter-rouge">GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS</code>). So <em>technically</em> vertex shaders
must always support <code class="language-plaintext highlighter-rouge">texture2D</code>, but they’re not required to support
actually having textures. It’s sort of like food service on an
airplane that doesn’t carry passengers. These platforms don’t support
this technique. So far I’ve only had this problem on some mobile
devices.</p>

<p>Outside of the lack of support by some platforms, this allows every
part of the simulation to stay on the GPU and paves the way for a pure
GPU particle system.</p>

<h3 id="obstacles">Obstacles</h3>

<p>An important observation is that particles do not interact with each
other. This is not an n-body simulation. They do, however, interact
with the rest of the world: they bounce intuitively off those static
circles. This environment is represented by another texture, one
that’s not updated during normal iteration. I call this the <em>obstacle</em>
texture.</p>

<p>The colors on the obstacle texture are surface normals. That is, each
pixel has a direction to it, a flow directing particles in some
direction. Empty space has a special normal value of (0, 0). This is
not normalized (doesn’t have a length of 1), so it’s an out-of-band
value that has no effect on particles.</p>

<p><img src="/img/particles/obstacle.png" alt="" /></p>

<p>(I didn’t realize until I was done how much this looks like the
Greendale Community College flag.)</p>

<p>A particle checks for a collision simply by sampling the obstacle
texture. If it finds a normal at its location, it changes its velocity
using the shader function <code class="language-plaintext highlighter-rouge">reflect</code>. This function is normally used
for reflecting light in a 3D scene, but it works equally well for
slow-moving particles. The effect is that particles bounce off the the
circle in a natural way.</p>

<p>Sometimes particles end up on/in an obstacle with a low or zero
velocity. To dislodge these they’re given a little nudge in the
direction of the normal, pushing them away from the obstacle. You’ll
see this on slopes where slow particles jiggle their way down to
freedom like jumping beans.</p>

<p>To make the obstacle texture user-friendly, the actual geometry is
maintained on the CPU side of things in JavaScript. It keeps a list of
these circles and, on updates, redraws the obstacle texture from this
list. This happens, for example, every time you move your mouse on the
screen, providing a moving obstacle. The texture provides
shader-friendly access to the geometry. Two representations for two
purposes.</p>

<p>When I started writing this part of the program, I envisioned that
shapes other than circles could place placed, too. For example, solid
rectangles: the normals would look something like this.</p>

<p><img src="/img/particles/rectangle.png" alt="" /></p>

<p>So far these are unimplemented.</p>

<h4 id="future-ideas">Future Ideas</h4>

<p>I didn’t try it yet, but I wonder if particles could interact with
each other by also drawing themselves onto the obstacles texture. Two
nearby particles would bounce off each other. Perhaps <a href="/blog/2013/06/26/">the entire
liquid demo</a> could run on the GPU like this. If I’m imagining
it correctly, particles would gain volume and obstacles forming bowl
shapes would fill up rather than concentrate particles into a single
point.</p>

<p>I think there’s still some more to explore with this project.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Feedback Applet Ported to WebGL</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/21/"/>
    <id>urn:uuid:1bcbcaaa-35b8-34f8-b114-34a2116882ef</id>
    <updated>2014-06-21T02:49:57Z</updated>
    <category term="webgl"/><category term="javascript"/><category term="media"/><category term="interactive"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>The biggest flaw with so many OpenGL tutorials is trying to teach two
complicated topics at once: the OpenGL API and 3D graphics. These are
only loosely related and do not need to be learned simultaneously.
It’s far more valuable to <a href="http://www.skorks.com/2010/04/on-the-value-of-fundamentals-in-software-development/">focus on the fundamentals</a>, which can
only happen when handled separately. With the programmable pipeline,
OpenGL is useful for a lot more than 3D graphics. There are many
non-3D directions that tutorials can take.</p>

<p>I think that’s why I’ve been enjoying my journey through WebGL so
much. Except for <a href="https://skeeto.github.io/sphere-js/">my sphere demo</a>, which was only barely 3D,
none of <a href="/toys/">my projects</a> have been what would typically be
considered 3D graphics. Instead, each new project has introduced me to
some new aspect of OpenGL, accidentally playing out like a great
tutorial. I started out drawing points and lines, then took a dive
<a href="https://skeeto.github.io/perlin-noise/">into non-trivial fragment shaders</a>, then <a href="/blog/2013/06/26/">textures and
framebuffers</a>, then the <a href="/blog/2014/06/01/">depth buffer</a>, then <a href="/blog/2014/06/10/">general
computation</a> with fragment shaders.</p>

<p>The next project introduced me to <em>alpha blending</em>. <strong>I ported <a href="/blog/2011/05/01/">my old
feedback applet</a> to WebGL!</strong></p>

<ul>
  <li><a href="https://skeeto.github.io/Feedback/webgl/">https://skeeto.github.io/Feedback/webgl/</a>
(<a href="http://github.com/skeeto/Feedback">source</a>)</li>
</ul>

<p>Since finishing the port I’ve already spent a couple of hours just
playing with it. It’s mesmerizing. Here’s a video demonstration in
case WebGL doesn’t work for you yet. I’m manually driving it to show
off the different things it can do.</p>

<video width="500" height="500" controls="">
  <source src="https://nullprogram.s3.amazonaws.com/feedback/feedback.webm" type="video/webm" />
  <source src="https://nullprogram.s3.amazonaws.com/feedback/feedback.mp4" type="video/mp4" />
  <img src="https://nullprogram.s3.amazonaws.com/feedback/feedback-poster.png" width="500" height="500" />
</video>

<h3 id="drawing-a-frame">Drawing a Frame</h3>

<p>On my laptop, the original Java version plods along at about 6 frames
per second. That’s because it does all of the compositing on the CPU.
Each frame it has to blend over 1.2 million color components. This is
exactly the sort of thing the GPU is built to do. The WebGL version
does the full 60 frames per second (i.e. requestAnimationFrame)
without breaking a sweat. The CPU only computes a couple of 3x3 affine
transformation matrices per frame: virtually nothing.</p>

<p>Similar to my <a href="/blog/2014/06/10/">WebGL Game of Life</a>, there’s texture stored on the
GPU that holds almost all the system state. It’s the same size as the
display. To draw the next frame, this texture is drawn to the display
directly, then transformed (rotated and scaled down slightly), and
drawn again to the display. This is the “feedback” part and it’s where
blending kicks in. It’s the core component of the whole project.</p>

<p>Next, some fresh shapes are drawn to the display (i.e. the circle for
the mouse cursor) and the entire thing is captured back onto the state
texture with <code class="language-plaintext highlighter-rouge">glCopyTexImage2D</code>, to be used for the next frame. It’s
important that <code class="language-plaintext highlighter-rouge">glCopyTexImage2D</code> is called before returning to the
JavaScript top-level (back to the event loop), because the screen data
will no longer be available at that point, even if it’s still visible
on the screen.</p>

<h4 id="alpha-blending">Alpha Blending</h4>

<p>They say a picture is worth a thousand words, and that’s literally
true with the <a href="http://www.andersriggelsen.dk/glblendfunc.php">Visual glBlendFunc + glBlendEquation Tool</a>. A
few minutes playing with that tool tells you pretty much everything
you need to know.</p>

<p>While you <em>could</em> potentially perform blending yourself in a fragment
shader with multiple draw calls, it’s much better (and faster) to
configure OpenGL to do it. There are two functions to set it up:
<code class="language-plaintext highlighter-rouge">glBlendFunc</code> and <code class="language-plaintext highlighter-rouge">glBlendEquation</code>. There are also “separate”
versions of all this for specifying color channels separately, but I
don’t need that for this project.</p>

<p>The enumeration passed to <code class="language-plaintext highlighter-rouge">glBlendFunc</code> decides how the colors are
combined. In WebGL our options are <code class="language-plaintext highlighter-rouge">GL_FUNC_ADD</code> (a + b),
<code class="language-plaintext highlighter-rouge">GL_FUNC_SUBTRACT</code> (a - b), <code class="language-plaintext highlighter-rouge">GL_FUNC_REVERSE_SUBTRACT</code> (b - a). In
regular OpenGL there’s also <code class="language-plaintext highlighter-rouge">GL_MIN</code> (min(a, b)) and <code class="language-plaintext highlighter-rouge">GL_MAX</code> (max(a,
b)).</p>

<p>The function <code class="language-plaintext highlighter-rouge">glBlendEquation</code> takes two enumerations, choosing how
the alpha channels are applied to the colors before the blend function
(above) is applied. The alpha channel could be ignored and the color
used directly (<code class="language-plaintext highlighter-rouge">GL_ONE</code>) or discarded (<code class="language-plaintext highlighter-rouge">GL_ZERO</code>). The alpha channel
could be multiplied directly (<code class="language-plaintext highlighter-rouge">GL_SRC_ALPHA</code>, <code class="language-plaintext highlighter-rouge">GL_DST_ALPHA</code>), or
inverted first (<code class="language-plaintext highlighter-rouge">GL_ONE_MINUS_SRC_ALPHA</code>). In WebGL there are 72
possible combinations.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">gl</span><span class="p">.</span><span class="nx">enable</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">BLEND</span><span class="p">);</span>
<span class="nx">gl</span><span class="p">.</span><span class="nx">blendEquation</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FUNC_ADD</span><span class="p">);</span>
<span class="nx">gl</span><span class="p">.</span><span class="nx">blendFunc</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">SRC_ALPHA</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">SRC_ALPHA</span><span class="p">);</span>
</code></pre></div></div>

<p>In this project I’m using <code class="language-plaintext highlighter-rouge">GL_FUNC_ADD</code> and <code class="language-plaintext highlighter-rouge">GL_SRC_ALPHA</code> for both
source and destination. The alpha value put out by the fragment shader
is the experimentally-determined, magical value of 0.62. A little
higher and the feedback tends to blend towards bright white really
fast. A little lower and it blends away to nothing really fast. It’s a
numerical instability that has the interesting side effect of making
the demo <strong>behave <em>slightly</em> differently depending on the floating
point precision of the GPU running it</strong>!</p>

<h3 id="saving-a-screenshot">Saving a Screenshot</h3>

<p>The HTML5 canvas object that provides the WebGL context has a
<code class="language-plaintext highlighter-rouge">toDataURL()</code> method for grabbing the canvas contents as a friendly
base64-encoded PNG image. Unfortunately this doesn’t work with WebGL
unless the <code class="language-plaintext highlighter-rouge">preserveDrawingBuffer</code> options is set, which can introduce
performance issues. Without this option, the browser is free to throw
away the drawing buffer before the next JavaScript turn, making the
pixel information inaccessible.</p>

<p>By coincidence there’s a really convenient workaround for this
project. Remember that state texture? That’s exactly what we want to
save. I can attach it to a framebuffer and use <code class="language-plaintext highlighter-rouge">glReadPixels</code> just
like did in WebGL Game of Life to grab the simulation state. The pixel
data is then drawn to a background canvas (<em>without</em> using WebGL) and
<code class="language-plaintext highlighter-rouge">toDataURL()</code> is used on that canvas to get a PNG image. I slap this
on a link with <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#attr-download">the new download attribute</a> and call it done.</p>

<h3 id="anti-aliasing">Anti-aliasing</h3>

<p>At the time of this writing, support for automatic anti-aliasing in
WebGL is sparse at best. I’ve never seen it working anywhere yet, in
any browser on any platform. <code class="language-plaintext highlighter-rouge">GL_SMOOTH</code> isn’t available and the
anti-aliasing context creation option doesn’t do anything on any of my
computers. Fortunately I was able to work around this <a href="http://rubendv.be/graphics/opengl/2014/03/25/drawing-antialiased-circles-in-opengl.html">using a cool
<code class="language-plaintext highlighter-rouge">smoothstep</code> trick</a>.</p>

<p>The article I linked explains it better than I could, but here’s the
gist of it. This shader draws a circle in a quad, but leads to jagged,
sharp edges.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">uniform</span> <span class="kt">vec4</span> <span class="n">color</span><span class="p">;</span>
<span class="k">varying</span> <span class="kt">vec3</span> <span class="n">coord</span><span class="p">;</span>  <span class="c1">// object space</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">distance</span><span class="p">(</span><span class="n">coord</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="kt">vec2</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="n">color</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><img src="/img/feedback/hard.png" alt="" /></p>

<p>The improved version uses <code class="language-plaintext highlighter-rouge">smoothstep</code> to fade from inside the circle
to outside the circle. Not only does it look nicer on the screen, I
think it looks nicer as code, too. Unfortunately WebGL has no <code class="language-plaintext highlighter-rouge">fwidth</code>
function as explained in the article, so the delta is hardcoded.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">uniform</span> <span class="kt">vec4</span> <span class="n">color</span><span class="p">;</span>
<span class="k">varying</span> <span class="kt">vec3</span> <span class="n">coord</span><span class="p">;</span>

<span class="k">const</span> <span class="kt">vec4</span> <span class="n">outside</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">float</span> <span class="n">delta</span> <span class="o">=</span> <span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">float</span> <span class="n">dist</span> <span class="o">=</span> <span class="n">distance</span><span class="p">(</span><span class="n">coord</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="kt">vec2</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
    <span class="kt">float</span> <span class="n">a</span> <span class="o">=</span> <span class="n">smoothstep</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="o">-</span> <span class="n">delta</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="n">dist</span><span class="p">);</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">color</span><span class="p">,</span> <span class="n">outside</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><img src="/img/feedback/smooth.png" alt="" /></p>

<h3 id="matrix-uniforms">Matrix Uniforms</h3>

<p>Up until this point I had avoided matrix uniforms. I was doing
transformations individually within the shader. However, as transforms
get more complicated, it’s much better to express the transform as a
matrix and let the shader language handle matrix multiplication
implicitly. Rather than pass half a dozen uniforms describing the
transform, you pass a single matrix that has the full range of motion.</p>

<p>My <a href="https://github.com/skeeto/igloojs">Igloo WebGL library</a> originally had a vector library that
provided GLSL-style vectors, including full swizzling. My long term
goal was to extend this to support GLSL-style matrices. However,
writing a matrix library from scratch was turning out to be <em>far</em> more
work than I expected. Plus it’s reinventing the wheel.</p>

<p>So, instead, I dropped my vector library — I completely deleted it —
and decided to use <a href="http://glmatrix.net/">glMatrix</a>, a <em>really</em> solid
WebGL-friendly matrix library. Highly recommended! It doesn’t
introduce any new types, it just provides functions for operating on
JavaScript typed arrays, the same arrays that get passed directly to
WebGL functions. This composes perfectly with Igloo without making it
a formal dependency.</p>

<p>Here’s my function for creating the mat3 uniform that transforms both
the main texture as well as the individual shape sprites. This use of
glMatrix looks a lot like <a href="http://docs.oracle.com/javase/7/docs/api/java/awt/geom/AffineTransform.html">java.awt.geom.AffineTransform</a>, does it
not? That’s one of my favorite parts of Java 2D, and <a href="/blog/2013/06/16/">I’ve been
missing it</a>.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Translate, scale, and rotate. */</span>
<span class="nx">Feedback</span><span class="p">.</span><span class="nx">affine</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">tx</span><span class="p">,</span> <span class="nx">ty</span><span class="p">,</span> <span class="nx">sx</span><span class="p">,</span> <span class="nx">sy</span><span class="p">,</span> <span class="nx">a</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">m</span> <span class="o">=</span> <span class="nx">mat3</span><span class="p">.</span><span class="nx">create</span><span class="p">();</span>
    <span class="nx">mat3</span><span class="p">.</span><span class="nx">translate</span><span class="p">(</span><span class="nx">m</span><span class="p">,</span> <span class="nx">m</span><span class="p">,</span> <span class="p">[</span><span class="nx">tx</span><span class="p">,</span> <span class="nx">ty</span><span class="p">]);</span>
    <span class="nx">mat3</span><span class="p">.</span><span class="nx">rotate</span><span class="p">(</span><span class="nx">m</span><span class="p">,</span> <span class="nx">m</span><span class="p">,</span> <span class="nx">a</span><span class="p">);</span>
    <span class="nx">mat3</span><span class="p">.</span><span class="nx">scale</span><span class="p">(</span><span class="nx">m</span><span class="p">,</span> <span class="nx">m</span><span class="p">,</span> <span class="p">[</span><span class="nx">sx</span><span class="p">,</span> <span class="nx">sy</span><span class="p">]);</span>
    <span class="k">return</span> <span class="nx">m</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The return value is just a plain Float32Array that I can pass to
<code class="language-plaintext highlighter-rouge">glUniformMatrix3fv</code>. It becomes the <code class="language-plaintext highlighter-rouge">placement</code> uniform in the
shader.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">attribute</span> <span class="kt">vec2</span> <span class="n">quad</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">mat3</span> <span class="n">placement</span><span class="p">;</span>
<span class="k">varying</span> <span class="kt">vec3</span> <span class="n">coord</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">coord</span> <span class="o">=</span> <span class="kt">vec3</span><span class="p">(</span><span class="n">quad</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="kt">vec2</span> <span class="n">position</span> <span class="o">=</span> <span class="p">(</span><span class="n">placement</span> <span class="o">*</span> <span class="kt">vec3</span><span class="p">(</span><span class="n">quad</span><span class="p">,</span> <span class="mi">1</span><span class="p">)).</span><span class="n">xy</span><span class="p">;</span>
    <span class="nb">gl_Position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">position</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>To move to 3D graphics from here, I would just need to step up to a
mat4 and operate on 3D coordinates instead of 2D. glMatrix would still
do the heavy lifting on the CPU side. If this was part of an OpenGL
tutorial series, perhaps that’s how it would transition to the next
stage.</p>

<h3 id="conclusion">Conclusion</h3>

<p>I’m really happy with how this one turned out. The only way it’s
indistinguishable from the original applet is that it runs faster. In
preparation for this project, I made a big pile of improvements to
Igloo, bringing it up to speed with my current WebGL knowledge. This
will greatly increase the speed at which I can code up and experiment
with future projects. WebGL + <a href="/blog/2012/10/31/">Skewer</a> + Igloo has really
become a powerful platform for rapid prototyping with OpenGL.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>A GPU Approach to Conway's Game of Life</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/10/"/>
    <id>urn:uuid:205b43ce-faa8-33c8-db27-173ddad64229</id>
    <updated>2014-06-10T06:29:48Z</updated>
    <category term="webgl"/><category term="javascript"/><category term="interactive"/><category term="gpgpu"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p class="abstract">
Update: In the next article, I <a href="/blog/2014/06/22/">extend this
program to solving mazes</a>. The demo has also been <a href="http://rykap.com/graphics/skew/2016/01/23/game-of-life.html">
ported to the Skew programming language</a>.
</p>

<p><a href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life">Conway’s Game of Life</a> is <a href="/blog/2014/06/01/">another well-matched
workload</a> for GPUs. Here’s the actual WebGL demo if you want
to check it out before continuing.</p>

<ul>
  <li><a href="http://skeeto.github.io/webgl-game-of-life/">https://skeeto.github.io/webgl-game-of-life/</a>
(<a href="http://github.com/skeeto/webgl-game-of-life/">source</a>)</li>
</ul>

<p>To quickly summarize the rules:</p>

<ul>
  <li>The universe is a two-dimensional grid of 8-connected square cells.</li>
  <li>A cell is either dead or alive.</li>
  <li>A dead cell with exactly three living neighbors comes to life.</li>
  <li>A live cell with less than two neighbors dies from underpopulation.</li>
  <li>A live cell with more than three neighbors dies from overpopulation.</li>
</ul>

<p><img src="/img/gol/gol.gif" alt="" /></p>

<p>These simple cellular automata rules lead to surprisingly complex,
organic patterns. Cells are updated in parallel, so it’s generally
implemented using two separate buffers. This makes it a perfect
candidate for an OpenGL fragment shader.</p>

<h3 id="preparing-the-textures">Preparing the Textures</h3>

<p>The entire simulation state will be stored in a single, 2D texture in
GPU memory. Each pixel of the texture represents one Life cell. The
texture will have the internal format GL_RGBA. That is, each pixel
will have a red, green, blue, and alpha channel. This texture is not
drawn directly to the screen, so how exactly these channels are used
is mostly unimportant. It’s merely a simulation data structure. This
is because I’m using <a href="/blog/2013/06/10/">the OpenGL programmable pipeline for general
computation</a>. I’m calling this the “front” texture.</p>

<p>Four multi-bit (actual width is up to the GPU) channels seems
excessive considering that all I <em>really</em> need is a single bit of
state for each cell. However, due to <a href="http://www.opengl.org/wiki/Framebuffer_Object#Framebuffer_Completeness">framebuffer completion
rules</a>, in order to draw onto this texture it <em>must</em> be GL_RGBA.
I could pack more than one cell into one texture pixel, but this would
reduce parallelism: the shader will run once per pixel, not once per
cell.</p>

<p>Because cells are updated in parallel, this texture can’t be modified
in-place. It would overwrite important state. In order to do any real
work I need a second texture to store the update. This is the “back”
texture. After the update, this back texture will hold the current
simulation state, so the names of the front and back texture are
swapped. The front texture always holds the current state, with the
back texture acting as a workspace.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">swap</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">tmp</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span><span class="p">;</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">back</span><span class="p">;</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">back</span> <span class="o">=</span> <span class="nx">tmp</span><span class="p">;</span>
    <span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Here’s how a texture is created and prepared. It’s wrapped in a
function/method because I’ll need two identical textures, making two
separate calls to this function. All of these settings are required
for framebuffer completion (explained later).</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">texture</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">gl</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">gl</span><span class="p">;</span>
    <span class="kd">var</span> <span class="nx">tex</span> <span class="o">=</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">createTexture</span><span class="p">();</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindTexture</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="nx">tex</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texParameteri</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_WRAP_S</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">REPEAT</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texParameteri</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_WRAP_T</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">REPEAT</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texParameteri</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_MIN_FILTER</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">NEAREST</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texParameteri</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_MAG_FILTER</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">NEAREST</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texImage2D</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">RGBA</span><span class="p">,</span>
                  <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">y</span><span class="p">,</span>
                  <span class="mi">0</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">RGBA</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">UNSIGNED_BYTE</span><span class="p">,</span> <span class="kc">null</span><span class="p">);</span>
    <span class="k">return</span> <span class="nx">tex</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>A texture wrap of <code class="language-plaintext highlighter-rouge">GL_REPEAT</code> means the simulation will be
automatically <a href="http://en.wikipedia.org/wiki/Wraparound_(video_games)">torus-shaped</a>. The interpolation is
<code class="language-plaintext highlighter-rouge">GL_NEAREST</code>, because I don’t want to interpolate between cell states
at all. The final OpenGL call initializes the texture size
(<code class="language-plaintext highlighter-rouge">this.statesize</code>). This size is different than the size of the
display because, again, this is <em>actually</em> a simulation data structure
for my purposes.</p>

<p>The <code class="language-plaintext highlighter-rouge">null</code> at the end would normally be texture data. I don’t need to
supply any data at this point, so this is left blank. Normally this
would leave the texture content in an undefined state, but for
security purposes, WebGL will automatically ensure that it’s zeroed.
Otherwise there’s a chance that sensitive data might leak from another
WebGL instance on another page or, worse, from another process using
OpenGL. I’ll make a similar call again later with <code class="language-plaintext highlighter-rouge">glTexSubImage2D()</code>
to fill the texture with initial random state.</p>

<p>In OpenGL ES, and therefore WebGL, wrapped (<code class="language-plaintext highlighter-rouge">GL_REPEAT</code>) texture
dimensions must be powers of two, i.e. 512x512, 256x1024, etc. Since I
want to exploit the built-in texture wrapping, I’ve decided to
constrain my simulation state size to powers of two. If I manually did
the wrapping in the fragment shader, I could make the simulation state
any size I wanted.</p>

<h3 id="framebuffers">Framebuffers</h3>

<p>A framebuffer is the target of the current <code class="language-plaintext highlighter-rouge">glClear()</code>,
<code class="language-plaintext highlighter-rouge">glDrawArrays()</code>, or <code class="language-plaintext highlighter-rouge">glDrawElements()</code>. The user’s display is the
<em>default</em> framebuffer. New framebuffers can be created and used as
drawing targets in place of the default framebuffer. This is how
things are drawn off-screen without effecting the display.</p>

<p>A framebuffer by itself is nothing but an empty frame. It needs a
canvas. Other resources are attached in order to make use of it. For
the simulation I want to draw onto the back buffer, so I attach this
to a framebuffer. If this framebuffer is bound at the time of the draw
call, the output goes onto the texture. This is really powerful
because <strong>this texture can be used as an input for another draw
command</strong>, which is exactly what I’ll be doing later.</p>

<p>Here’s what making a single step of the simulation looks like.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">step</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">gl</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">gl</span><span class="p">;</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindFramebuffer</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FRAMEBUFFER</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">framebuffers</span><span class="p">.</span><span class="nx">step</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">framebufferTexture2D</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FRAMEBUFFER</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">COLOR_ATTACHMENT0</span><span class="p">,</span>
                            <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">back</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">viewport</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">y</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindTexture</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span><span class="p">);</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">programs</span><span class="p">.</span><span class="nx">gol</span><span class="p">.</span><span class="nx">use</span><span class="p">()</span>
        <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">quad</span><span class="dl">'</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">buffers</span><span class="p">.</span><span class="nx">quad</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">state</span><span class="dl">'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">scale</span><span class="dl">'</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">draw</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TRIANGLE_STRIP</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">swap</span><span class="p">();</span>
    <span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>First, bind the custom framebuffer as the current framebuffer with
<code class="language-plaintext highlighter-rouge">glBindFramebuffer()</code>. This framebuffer was previously created with
<code class="language-plaintext highlighter-rouge">glCreateFramebuffer()</code> and required no initial configuration. The
configuration is entirely done here, where the back texture is
attached to the current framebuffer. This replaces any texture that
might currently be attached to this spot — like the front texture
from the previous iteration. Finally, the size of the drawing area is
locked to the size of the simulation state with <code class="language-plaintext highlighter-rouge">glViewport()</code>.</p>

<p><a href="https://github.com/skeeto/igloojs">Using Igloo again</a> to keep the call concise, a fullscreen quad
is rendered so that the fragment shader runs <em>exactly</em> once for each
cell. That <code class="language-plaintext highlighter-rouge">state</code> uniform is the front texture, bound as
<code class="language-plaintext highlighter-rouge">GL_TEXTURE0</code>.</p>

<p>With the drawing complete, the buffers are swapped. Since every pixel
was drawn, there’s no need to ever use <code class="language-plaintext highlighter-rouge">glClear()</code>.</p>

<h3 id="the-game-of-life-fragment-shader">The Game of Life Fragment Shader</h3>

<p>The simulation rules are coded entirely in the fragment shader. After
initialization, JavaScript’s only job is to make the appropriate
<code class="language-plaintext highlighter-rouge">glDrawArrays()</code> call over and over. To run different cellular automata,
all I would need to do is modify the fragment shader and generate an
appropriate initial state for it.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">uniform</span> <span class="kt">sampler2D</span> <span class="n">state</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">scale</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">get</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="kt">int</span><span class="p">(</span><span class="n">texture2D</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="p">(</span><span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span> <span class="o">/</span> <span class="n">scale</span><span class="p">).</span><span class="n">r</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">sum</span> <span class="o">=</span> <span class="n">get</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span>  <span class="mi">0</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span>  <span class="mi">1</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span> <span class="mi">0</span><span class="p">,</span>  <span class="mi">1</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span> <span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span> <span class="mi">1</span><span class="p">,</span>  <span class="mi">0</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">get</span><span class="p">(</span> <span class="mi">1</span><span class="p">,</span>  <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">sum</span> <span class="o">==</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
        <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">sum</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">float</span> <span class="n">current</span> <span class="o">=</span> <span class="kt">float</span><span class="p">(</span><span class="n">get</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
        <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">current</span><span class="p">,</span> <span class="n">current</span><span class="p">,</span> <span class="n">current</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">get(int, int)</code> function returns the value of the cell at (x, y),
0 or 1. For the sake of simplicity, the output of the fragment shader
is solid white and black, but just sampling one channel (red) is good
enough to know the state of the cell. I’ve learned that loops and
arrays are are troublesome in GLSL, so I’ve manually unrolled the
neighbor check. Cellular automata that have more complex state could
make use of the other channels and perhaps even exploit alpha channel
blending in some special way.</p>

<p>Otherwise, this is just a straightforward encoding of the rules.</p>

<h3 id="displaying-the-state">Displaying the State</h3>

<p>What good is the simulation if the user doesn’t see anything? So far
all of the draw calls have been done on a custom framebuffer. Next
I’ll render the simulation state to the default framebuffer.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">draw</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">gl</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">gl</span><span class="p">;</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindFramebuffer</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FRAMEBUFFER</span><span class="p">,</span> <span class="kc">null</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">viewport</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">viewsize</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">viewsize</span><span class="p">.</span><span class="nx">y</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindTexture</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span><span class="p">);</span>
    <span class="k">this</span><span class="p">.</span><span class="nx">programs</span><span class="p">.</span><span class="nx">copy</span><span class="p">.</span><span class="nx">use</span><span class="p">()</span>
        <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">quad</span><span class="dl">'</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">buffers</span><span class="p">.</span><span class="nx">quad</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">state</span><span class="dl">'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">scale</span><span class="dl">'</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">viewsize</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">draw</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TRIANGLE_STRIP</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
    <span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>First, bind the default framebuffer as the current buffer. There’s no
actual handle for the default framebuffer, so using <code class="language-plaintext highlighter-rouge">null</code> sets it to
the default. Next, set the viewport to the size of the display. Then
use the “copy” program to copy the state to the default framebuffer
where the user will see it. One pixel per cell is <em>far</em> too small, so
it will be scaled as a consequence of <code class="language-plaintext highlighter-rouge">this.viewsize</code> being four times
larger.</p>

<p>Here’s what the “copy” fragment shader looks like. It’s so simple
because I’m storing the simulation state in black and white. If the
state was in a different format than the display format, this shader
would need to perform the translation.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">uniform</span> <span class="kt">sampler2D</span> <span class="n">state</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">scale</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="n">texture2D</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span> <span class="o">/</span> <span class="n">scale</span><span class="p">);</span>
<span class="p">}</span>

</code></pre></div></div>

<p>Since I’m scaling up by four — i.e. 16 pixels per cell — this
fragment shader is run 16 times per simulation cell. Since I used
<code class="language-plaintext highlighter-rouge">GL_NEAREST</code> on the texture there’s no funny business going on here.
If I had used <code class="language-plaintext highlighter-rouge">GL_LINEAR</code>, it would look blurry.</p>

<p>You might notice I’m passing in a <code class="language-plaintext highlighter-rouge">scale</code> uniform and using
<code class="language-plaintext highlighter-rouge">gl_FragCoord</code>. The <code class="language-plaintext highlighter-rouge">gl_FragCoord</code> variable is in window-relative
coordinates, but when I sample a texture I need unit coordinates:
values between 0 and 1. To get this, I divide <code class="language-plaintext highlighter-rouge">gl_FragCoord</code> by the
size of the viewport. Alternatively I could pass the coordinates as a
varying from the vertex shader, automatically interpolated between the
quad vertices.</p>

<p>An important thing to notice is that <strong>the simulation state never
leaves the GPU</strong>. It’s updated there and it’s drawn there. The CPU is
operating the simulation like the strings on a marionette — <em>from a
thousand feet up in the air</em>.</p>

<h3 id="user-interaction">User Interaction</h3>

<p>What good is a Game of Life simulation if you can’t poke at it? If all
of the state is on the GPU, how can I modify it? This is where
<code class="language-plaintext highlighter-rouge">glTexSubImage2D()</code> comes in. As its name implies, it’s used to set
the values of some portion of a texture. I want to write a <code class="language-plaintext highlighter-rouge">poke()</code>
method that uses this OpenGL function to set a single cell.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">poke</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span> <span class="nx">y</span><span class="p">,</span> <span class="nx">value</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">gl</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">gl</span><span class="p">,</span>
        <span class="nx">v</span> <span class="o">=</span> <span class="nx">value</span> <span class="o">*</span> <span class="mi">255</span><span class="p">;</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindTexture</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">texSubImage2D</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">x</span><span class="p">,</span> <span class="nx">y</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span>
                     <span class="nx">gl</span><span class="p">.</span><span class="nx">RGBA</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">UNSIGNED_BYTE</span><span class="p">,</span>
                     <span class="k">new</span> <span class="nb">Uint8Array</span><span class="p">([</span><span class="nx">v</span><span class="p">,</span> <span class="nx">v</span><span class="p">,</span> <span class="nx">v</span><span class="p">,</span> <span class="mi">255</span><span class="p">]));</span>
    <span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Bind the front texture, set the region at (x, y) of size 1x1 (a single
pixel) to a very specific RGBA value. There’s nothing else to it. If
you click on the simulation in my demo, it will call this poke method.
This method could also be used to initialize the entire simulation
with random values, though it wouldn’t be very efficient doing it one
pixel at a time.</p>

<h3 id="getting-the-state">Getting the State</h3>

<p>What if you wanted to read the simulation state into CPU memory,
perhaps to store for reloading later? So far I can set the state and
step the simulation, but there’s been no way to get at the data.
Unfortunately I can’t directly access texture data. There’s nothing
like the inverse of <code class="language-plaintext highlighter-rouge">glTexSubImage2D()</code>. Here are a few options:</p>

<ul>
  <li>
    <p>Call <code class="language-plaintext highlighter-rouge">toDataURL()</code> on the canvas. This would grab the rendering of
the simulation, which would need to be translated back into
simulation state. Sounds messy.</p>
  </li>
  <li>
    <p>Take a screenshot. Basically the same idea, but even messier.</p>
  </li>
  <li>
    <p>Use <code class="language-plaintext highlighter-rouge">glReadPixels()</code> on a framebuffer. The texture can be attached to
a framebuffer, then read through the framebuffer. This is the right
solution.</p>
  </li>
</ul>

<p>I’m reusing the “step” framebuffer for this since it’s already
intended for these textures to be its attachments.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GOL</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="kd">get</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">gl</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">gl</span><span class="p">,</span> <span class="nx">w</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span> <span class="nx">h</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">statesize</span><span class="p">.</span><span class="nx">y</span><span class="p">;</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">bindFramebuffer</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FRAMEBUFFER</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">framebuffers</span><span class="p">.</span><span class="nx">step</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">framebufferTexture2D</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">FRAMEBUFFER</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">COLOR_ATTACHMENT0</span><span class="p">,</span>
                            <span class="nx">gl</span><span class="p">.</span><span class="nx">TEXTURE_2D</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">textures</span><span class="p">.</span><span class="nx">front</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="kd">var</span> <span class="nx">rgba</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Uint8Array</span><span class="p">(</span><span class="nx">w</span> <span class="o">*</span> <span class="nx">h</span> <span class="o">*</span> <span class="mi">4</span><span class="p">);</span>
    <span class="nx">gl</span><span class="p">.</span><span class="nx">readPixels</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">w</span><span class="p">,</span> <span class="nx">h</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">RGBA</span><span class="p">,</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">UNSIGNED_BYTE</span><span class="p">,</span> <span class="nx">rgba</span><span class="p">);</span>
    <span class="k">return</span> <span class="nx">rgba</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Voilà! This <code class="language-plaintext highlighter-rouge">rgba</code> array can be passed directly back to
<code class="language-plaintext highlighter-rouge">glTexSubImage2D()</code> as a perfect snapshot of the simulation state.</p>

<h3 id="conclusion">Conclusion</h3>

<p>This project turned out to be far simpler than I anticipated, so much
so that I was able to get the simulation running within an evening’s
effort. I learned a whole lot more about WebGL in the process, enough
for me to revisit <a href="/blog/2013/06/26/">my WebGL liquid simulation</a>. It uses a
similar texture-drawing technique, which I really fumbled through that
first time. I dramatically cleaned it up, making it fast enough to run
smoothly on my mobile devices.</p>

<p>Also, this Game of Life implementation is <em>blazing</em> fast. If rendering
is skipped, <strong>it can run a 2048x2048 Game of Life at over 18,000
iterations per second!</strong> However, this isn’t terribly useful because
it hits its steady state well before that first second has passed.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A GPU Approach to Voronoi Diagrams</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/01/"/>
    <id>urn:uuid:97759105-8995-34d3-c914-a84eb7eb762c</id>
    <updated>2014-06-01T21:53:48Z</updated>
    <category term="webgl"/><category term="media"/><category term="video"/><category term="math"/><category term="interactive"/><category term="gpgpu"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>I recently got an itch to play around with <a href="http://en.wikipedia.org/wiki/Voronoi_diagram">Voronoi diagrams</a>.
It’s a diagram that divides a space into regions composed of points
closest to one of a set of seed points. There are a couple of
algorithms for computing a Voronoi diagram: Bowyer-Watson and Fortune.
These are complicated and difficult to implement.</p>

<p>However, if we’re interested only in <em>rendering</em> a Voronoi diagram as
a bitmap, there’s a trivial brute for algorithm. For every pixel of
output, determine the closest seed vertex and color that pixel
appropriately. It’s slow, especially as the number of seed vertices
goes up, but it works perfectly and it’s dead simple!</p>

<p>Does this strategy seem familiar? It sure sounds a lot like an OpenGL
<em>fragment shader</em>! With a shader, I can push the workload off to the
GPU, which is intended for this sort of work. Here’s basically what it
looks like.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* voronoi.frag */</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">seeds</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">uniform</span> <span class="kt">vec3</span> <span class="n">colors</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">float</span> <span class="n">dist</span> <span class="o">=</span> <span class="n">distance</span><span class="p">(</span><span class="n">seeds</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span><span class="p">);</span>
    <span class="kt">vec3</span> <span class="n">color</span> <span class="o">=</span> <span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">32</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">float</span> <span class="n">current</span> <span class="o">=</span> <span class="n">distance</span><span class="p">(</span><span class="n">seeds</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">current</span> <span class="o">&lt;</span> <span class="n">dist</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">color</span> <span class="o">=</span> <span class="n">colors</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="n">dist</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">color</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If you have a WebGL-enabled browser, you can see the results for
yourself here. Now, as I’ll explain below, what you see here isn’t
really this shader, but the result looks identical. There are two
different WebGL implementations included, but only the smarter one is
active. (There’s also a really slow HTML5 canvas fallback.)</p>

<ul>
  <li><a href="http://skeeto.github.io/voronoi-toy/">https://skeeto.github.io/voronoi-toy/</a>
(<a href="http://github.com/skeeto/voronoi-toy">source</a>)</li>
</ul>

<p>You can click and drag points around the diagram with your mouse. You
can add and remove points with left and right clicks. And if you press
the “a” key, the seed points will go for a random walk, animating the
whole diagram. Here’s a (HTML5) video showing it off.</p>

<video width="500" height="280" controls="" preload="metadata">
  <source src="https://nullprogram.s3.amazonaws.com/voronoi/voronoi.webm" type="video/webm" />
  <source src="https://nullprogram.s3.amazonaws.com/voronoi/voronoi.mp4" type="video/mp4" />
</video>

<p>Unfortunately, there are some serious problems with this approach. It
has to do with passing seed information as uniforms.</p>

<ol>
  <li>
    <p><strong>The number of seed vertices is hardcoded.</strong> The shader language
requires uniform arrays to have known lengths at compile-time. If I
want to increase the number of seed vertices, I need to generate,
compile, and link a new shader to replace it. My implementation
actually does this. The number is replaced with a <code class="language-plaintext highlighter-rouge">%%MAX%%</code>
template that I fill in using a regular expression before sending
the program off to the GPU.</p>
  </li>
  <li>
    <p><strong>The number of available uniform bindings is very constrained</strong>,
even on high-end GPUs: <code class="language-plaintext highlighter-rouge">GL_MAX_FRAGMENT_UNIFORM_VECTORS</code>. This
value is allowed to be as small as 16! A typical value on high-end
graphics cards is a mere 221. Each array element counts as a
binding, so our shader may be limited to as few as 8 seed vertices.
Even on nice GPUs, we’re absolutely limited to 110 seed vertices.
An alternative approach might be passing seed and color information
as a texture, but I didn’t try this.</p>
  </li>
  <li>
    <p><strong>There’s no way to bail out of the loop early</strong>, at least with
OpenGL ES 2.0 (WebGL) shaders. We can’t <code class="language-plaintext highlighter-rouge">break</code> or do any sort of
branching on the loop variable. Even if we only have 4 seed
vertices, we still have to compare against the full count. The GPU
has plenty of time available, so this wouldn’t be a big issue,
except that we need to skip over the “unused” seeds somehow. They
need to be given unreasonable position values. Infinity would be an
unreasonable value (infinitely far away), but GLSL floats aren’t
guaranteed to be able to represent infinity. We can’t even know
what the maximum floating-point value might be. If we pick
something too large, we get an overflow garbage value, such as 0
(!!!) in my experiments.</p>
  </li>
</ol>

<p>Because of these limitations, this is not a very good way of going
about computing Voronoi diagrams on a GPU. Fortunately there’s a
<em>much</em> much better approach!</p>

<h3 id="a-smarter-approach">A Smarter Approach</h3>

<p>With the above implemented, I was playing around with the fragment
shader, going beyond solid colors. For example, I changed the
shade/color based on distance from the seed vertex. A results of this
was this “blood cell” image, a difference of a couple lines in the
shader.</p>

<p><a href="https://nullprogram.s3.amazonaws.com/voronoi/blood.png">
  <img src="https://nullprogram.s3.amazonaws.com/voronoi/blood.png" width="500" height="312" />
</a></p>

<p>That’s when it hit me! Render each seed as cone pointed towards the
camera in an orthographic projection, coloring each cone according to
the seed’s color. The Voronoi diagram would work itself out
<em>automatically</em> in the depth buffer. That is, rather than do all this
distance comparison in the shader, let OpenGL do its normal job of
figuring out the scene geometry.</p>

<p>Here’s a video (<a href="https://nullprogram.s3.amazonaws.com/voronoi/voronoi-cones.gif">GIF</a>) I made that demonstrates what I mean.</p>

<video width="500" height="500" controls="" preload="metadata">
  <source src="https://nullprogram.s3.amazonaws.com/voronoi/voronoi-cones.webm" type="video/webm" />
  <source src="https://nullprogram.s3.amazonaws.com/voronoi/voronoi-cones.mp4" type="video/mp4" />
  <img src="https://nullprogram.s3.amazonaws.com/voronoi/voronoi-cones.gif" width="500" height="500" />
</video>

<p>Not only is this much faster, it’s also far simpler! Rather than being
limited to a hundred or so seed vertices, this version could literally
do millions of them, limited only by the available memory for
attribute buffers.</p>

<h4 id="the-resolution-catch">The Resolution Catch</h4>

<p>There’s a catch, though. There’s no way to perfectly represent a cone
in OpenGL. (And if there was, we’d be back at the brute force approach
as above anyway.) The cone must be built out of primitive triangles,
sort of like pizza slices, using <code class="language-plaintext highlighter-rouge">GL_TRIANGLE_FAN</code> mode. Here’s a cone
made of 16 triangles.</p>

<p><img src="https://nullprogram.s3.amazonaws.com/voronoi/triangle-fan.png" alt="" /></p>

<p>Unlike the previous brute force approach, this is an <em>approximation</em>
of the Voronoi diagram. The more triangles, the better the
approximation, converging on the precision of the initial brute force
approach. I found that for this project, about 64 triangles was
indistinguishable from brute force.</p>

<p><img src="https://nullprogram.s3.amazonaws.com/voronoi/resolution.gif" width="500" height="500" /></p>

<h4 id="instancing-to-the-rescue">Instancing to the Rescue</h4>

<p>At this point things are looking pretty good. On my desktop, I can
maintain 60 frames-per-second for up to about 500 seed vertices moving
around randomly (“a”). After this, it becomes <em>draw-bound</em> because
each seed vertex requires a separate glDrawArrays() call to OpenGL.
The workaround for this is an OpenGL extension called instancing. The
<a href="http://blog.tojicode.com/2013/07/webgl-instancing-with.html">WebGL extension for instancing</a> is <code class="language-plaintext highlighter-rouge">ANGLE_instanced_arrays</code>.</p>

<p>The cone model was already sent to the GPU during initialization, so,
without instancing, the draw loop only has to bind the uniforms and
call draw for each seed. This code uses my <a href="https://github.com/skeeto/igloojs">Igloo WebGL
library</a> to simplify the API.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">cone</span> <span class="o">=</span> <span class="nx">programs</span><span class="p">.</span><span class="nx">cone</span><span class="p">.</span><span class="nx">use</span><span class="p">()</span>
        <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">cone</span><span class="dl">'</span><span class="p">,</span> <span class="nx">buffers</span><span class="p">.</span><span class="nx">cone</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">seeds</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="nx">cone</span><span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">color</span><span class="dl">'</span><span class="p">,</span> <span class="nx">seeds</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">color</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">uniform</span><span class="p">(</span><span class="dl">'</span><span class="s1">position</span><span class="dl">'</span><span class="p">,</span> <span class="nx">seeds</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">position</span><span class="p">)</span>
        <span class="p">.</span><span class="nx">draw</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TRIANGLE_FAN</span><span class="p">,</span> <span class="mi">66</span><span class="p">);</span>  <span class="c1">// 64 triangles == 66 verts</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s driving this pair of shaders.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* cone.vert */</span>
<span class="k">attribute</span> <span class="kt">vec3</span> <span class="n">cone</span><span class="p">;</span>
<span class="k">uniform</span> <span class="kt">vec2</span> <span class="n">position</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="nb">gl_Position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">cone</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">position</span><span class="p">,</span> <span class="n">cone</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* cone.frag */</span>
<span class="k">uniform</span> <span class="kt">vec3</span> <span class="n">color</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">color</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Instancing works by adjusting how attributes are stepped. Normally the
vertex shader runs once per element, but instead we can ask that some
attributes step once per <em>instance</em>, or even once per multiple
instances. Uniforms are then converted to vertex attribs and the
“loop” runs implicitly on the GPU. The instanced glDrawArrays() call
takes one additional argument: the number of instances to draw.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">ext</span> <span class="o">=</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getExtension</span><span class="p">(</span><span class="dl">"</span><span class="s2">ANGLE_instanced_arrays</span><span class="dl">"</span><span class="p">);</span> <span class="c1">// only once</span>

<span class="nx">programs</span><span class="p">.</span><span class="nx">cone</span><span class="p">.</span><span class="nx">use</span><span class="p">()</span>
    <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">cone</span><span class="dl">'</span><span class="p">,</span> <span class="nx">buffers</span><span class="p">.</span><span class="nx">cone</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
    <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">position</span><span class="dl">'</span><span class="p">,</span> <span class="nx">buffers</span><span class="p">.</span><span class="nx">positions</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
    <span class="p">.</span><span class="nx">attrib</span><span class="p">(</span><span class="dl">'</span><span class="s1">color</span><span class="dl">'</span><span class="p">,</span> <span class="nx">buffers</span><span class="p">.</span><span class="nx">colors</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
<span class="cm">/* Tell OpenGL these iterate once (1) per instance. */</span>
<span class="nx">ext</span><span class="p">.</span><span class="nx">vertexAttribDivisorANGLE</span><span class="p">(</span><span class="nx">cone</span><span class="p">.</span><span class="nx">vars</span><span class="p">[</span><span class="dl">'</span><span class="s1">position</span><span class="dl">'</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
<span class="nx">ext</span><span class="p">.</span><span class="nx">vertexAttribDivisorANGLE</span><span class="p">(</span><span class="nx">cone</span><span class="p">.</span><span class="nx">vars</span><span class="p">[</span><span class="dl">'</span><span class="s1">color</span><span class="dl">'</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
<span class="nx">ext</span><span class="p">.</span><span class="nx">drawArraysInstancedANGLE</span><span class="p">(</span><span class="nx">gl</span><span class="p">.</span><span class="nx">TRIANGLE_FAN</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">66</span><span class="p">,</span> <span class="nx">seeds</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
</code></pre></div></div>

<p>The ugly ANGLE names are because this is an extension, not part of
WebGL itself. As such, my program will fall back to use multiple draw
calls when the extension is not available. It’s only there for a speed
boost.</p>

<p>Here are the new shaders. Notice the uniforms are gone.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* cone-instanced.vert */</span>
<span class="k">attribute</span> <span class="kt">vec3</span> <span class="n">cone</span><span class="p">;</span>
<span class="k">attribute</span> <span class="kt">vec2</span> <span class="n">position</span><span class="p">;</span>
<span class="k">attribute</span> <span class="kt">vec3</span> <span class="n">color</span><span class="p">;</span>

<span class="k">varying</span> <span class="kt">vec3</span> <span class="n">vcolor</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">vcolor</span> <span class="o">=</span> <span class="n">color</span><span class="p">;</span>
    <span class="nb">gl_Position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">cone</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">position</span><span class="p">,</span> <span class="n">cone</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* cone-instanced.frag */</span>
<span class="k">varying</span> <span class="kt">vec3</span> <span class="n">vcolor</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">vcolor</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>On the same machine, the instancing version can do a few thousand seed
vertices (an order of magnitude more) at 60 frames-per-second, after
which it becomes bandwidth saturated. This is because, for the
animation, every vertex position is updated on the GPU on each frame.
At this point it’s overcrowded anyway, so there’s no need to support
more.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Liquid Simulation in WebGL</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/26/"/>
    <id>urn:uuid:a0e42262-19a8-3208-4a5b-b70485f5ae8a</id>
    <updated>2013-06-26T00:00:00Z</updated>
    <category term="javascript"/><category term="interactive"/><category term="webgl"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>Over a year ago I implemented
<a href="/blog/2012/02/03/">a liquid simulation using a Box2D and Java 2D</a>. It’s a neat
trick that involves simulating a bunch of balls in a container,
blurring the rendering of this simulation, and finally thresholding
the blurred rendering. Due to <a href="/blog/2013/06/10/">my recent affection for WebGL</a>,
this week I ported the whole thing to JavaScript and WebGL.</p>

<ul>
  <li><a href="/fun-liquid/webgl/">nullprogram.com/fun-liquid/webgl/</a></li>
</ul>

<p>Unlike the previous Java version, blurring and thresholding is
performed entirely on the GPU. It <em>should</em> therefore be less CPU
intensive and a lot more GPU intensive. Assuming a decent GPU, it will
run at a (fixed) 60 FPS, as opposed to the mere 30 FPS I could squeeze
out of the old version. Other than this, the JavaScript version should
look pretty much identical to the Java version.</p>

<h3 id="box2d-performance">Box2D performance</h3>

<p>I ran into a few complications while porting. The first was the
performance of Box2D. I started out by using <a href="http://code.google.com/p/box2dweb/">box2dweb</a>,
which is a port of Box2DFlash, which is itself a port of Box2D. Even
on V8, the performance was poor enough that I couldn’t simulate enough
balls to achieve the liquid effect. The original JBox2D version
handles 400 balls just fine while this one was struggling to do about
40.</p>

<p><a href="http://www.50ply.com/">Brian</a> suggested I try out
<a href="https://github.com/kripken/box2d.js/">the Box2D emscripten port</a>. Rather than manually port Box2D
to JavaScript, emscripten compiles the original C++ to JavaScript via
LLVM, being so direct as to even maintain its own heap. The benefit is
much better performance, but the cost is a difficult API. Interacting
with emscripten-compiled code can be rather cumbersome, and this
emscripten port isn’t yet fully worked out. For example, creating a
PolygonShape object involves allocating an array on the emscripten
heap and manipulating a pointer-like thing. And when you screw up, the
error messages are completely unhelpful.</p>

<p>Moving to this other version of Box2D allowed me to increase the
number of balls to about 150, which is just enough to pull off the
effect. I’m still a bit surprised how slow this is. The computation
complexity for this is something like an O(n^2), so 150 is a long ways
behind 400. I may revisit this in the future to try to get better
performance by crafting my own very specialized physics engine from
scratch.</p>

<h3 id="webgl-complexity">WebGL complexity</h3>

<p>Before I even got into writing the WebGL component of this, I
implemented a 2D canvas display, without any blurring, just for
getting Box2D tuned. If you visit the demonstration page without a
WebGL-capable browser you’ll see this plain canvas display.</p>

<p>Getting WebGL to do the same thing was very simple. I used <code class="language-plaintext highlighter-rouge">GL_POINTS</code>
to draw the balls just like I had done with <a href="/sphere-js/">the sphere demo</a>.
To do blurring I would need to render this first stage onto an
intermediate framebuffer, then using this framebuffer as an input
texture I would blur and threshold this into the default framebuffer.</p>

<p>This actually took me awhile to work out, much longer than I had
anticipated. To prepare this intermediate framebuffer you must first
create and configure a texture. Then create and configure a
<em>render</em>buffer to fill in as the depth buffer. Then finally create the
framebuffer and attach both of these to it. Skip any step and all you
get are some vague WebGL warnings. (With regular OpenGL it’s worse,
since you get no automatic warnings at all.)</p>

<p>WebGL textures must have dimensions that are powers of two. However,
my final output does not. Carefully rendering onto a texture with a
different aspect ratio and properly sampling the results back off
introduces an intermediate coordinate system which mucks things up a
bit. It took me some time to wrap my head around it to work everything
out.</p>

<p>Finally, once I was nearly done, my fancy new shader was consistently
causing OpenGL to crash, taking my browser down with it. I had to
switch to a different computer to continue developing.</p>

<h4 id="the-gpu-as-a-bottleneck">The GPU as a bottleneck</h4>

<p>For the second time since I’ve picked up WebGL I have overestimated
graphics cards’ performance capabilities. It turns out my CPU is
faster at convolving a 25x25 kernel — the size of the convolution
kernel in the Java version — than any GPU that I have access to. If I
reduce the size of the kernel the GPU gets its edge back. The only way
to come close to 25x25 on the GPU is to cut some corners. I finally
settled on an 19x19 kernel, which seems to work just about as well
without being horribly slow. I may revisit this in the future so that
lower-end GPUs can run this at 60 FPS as well.</p>

<h3 id="conclusion">Conclusion</h3>

<p>I’m really happy with the results, and writing this has been a good
exercise in OpenGL. I completely met one of my original goals: to look
practically identical to the original Java version. I <em>mostly</em> met my
second, performance goal. On my nice desktop computer it runs more
than twice as fast, but, unfortunately, it’s very slow my tablet. If I
revisit this project in the future, the purpose will be to optimize
for these lower-end, mobile devices.</p>

<p>This project has also been a useful testbed for a low-level WebGL
wrapper library I’m working on called <a href="https://github.com/skeeto/igloojs">Igloo</a>, which I’ll cover
in a future post.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Long Live WebGL</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/10/"/>
    <id>urn:uuid:75a9dce9-79f1-388e-f5f9-578cbb5b8800</id>
    <updated>2013-06-10T00:00:00Z</updated>
    <category term="javascript"/><category term="interactive"/><category term="web"/><category term="webgl"/><category term="opengl"/>
    <content type="html">
      <![CDATA[<p>On several occasions over the last few years I’ve tried to get into
OpenGL programming. I’d sink an afternoon into attempting to learn it,
only to get frustrated and quit without learning much. There’s a lot
of outdated and downright poor information out there, and a beginner
can’t tell the good from the bad. I tried using OpenGL from C++, then
Java (<a href="http://www.lwjgl.org/">lwjgl</a>), then finally JavaScript (<a href="http://en.wikipedia.org/wiki/WebGL">WebGL</a>). This
last one is what finally stuck, unlocking a new world of projects for
me. It’s been very empowering!</p>

<p>I’ll explain why WebGL is what finally made OpenGL click for me.</p>

<h3 id="old-vs-new">Old vs. New</h3>

<p>I may get a few details wrong, but here’s the gist of it.</p>

<p>Currently there are basically two ways to use OpenGL: the old way
(<em>compatibility profile</em>, fixed-function pipeline) and the new way
(<em>core profile</em>, programmable pipeline). The new API came about
because of a specific new capability that graphics cards gained years
after the original OpenGL specification was written. This is, modern
graphics cards are fully programmable. Programs can be compiled with
the GPU hardware as the target, allowing them to run directly on the
graphics card. The new API is oriented around running these programs
on the graphics card.</p>

<p>Before the programmable pipeline, graphics cards had a fixed set of
functionality for rendering 3D graphics. You tell it what
functionality you want to use, then hand it data little bits at a
time. Any functionality not provided by the GPU had to be done on the
CPU. The CPU ends of doing a lot of the work that would be better
suited for a GPU, in addition to spoon-feeding data to the GPU during
rendering.</p>

<p>With the programmable pipeline, you start by sending a program, called
a <em>shader</em>, to the GPU. At the application’s run-time, the graphics
driver takes care of compiling this shader, which is written in the
OpenGL Shading Language (GLSL). When it comes time to render a frame,
you prepare all the shader’s inputs in memory buffers on the GPU, then
issue a <em>draw</em> command to the GPU. The program output goes into
another buffer, probably to be treated as pixels for the screen. On
it’s own, the GPU processes the inputs in parallel <em>much</em> faster than a
CPU could ever do sequentially.</p>

<p>An <em>very</em> important detail to notice here is that, at a high level,
<strong>this process is almost orthogonal to the concept of rendering
graphics</strong>. The inputs to a shader are arbitrary data. The final
output is arbitrary data. The process is structured so that it’s
easily used to render graphics, but it’s not strictly required. It can
be used to perform arbitrary computations.</p>

<p>This paradigm shift in GPU architecture is the biggest barrier to
learning OpenGL. The apparent surface area of the API is doubled in
size because it includes the irrelevant, outdated parts. Sure, the
recent versions of OpenGL eschew the fixed-function API (3.1+), but
all of that mess still shows up when browsing and searching
documentation. Worse, <strong>there are still many tutorials that teach the
outdated API</strong>. In fact, as of this writing the first Google result
for “opengl tutorial” turns up one of these outdated tutorials.</p>

<h3 id="opengl-es-and-webgl">OpenGL ES and WebGL</h3>

<p>OpenGL for Embedded Systems (<a href="http://en.wikipedia.org/wiki/OpenGL_ES">OpenGL ES</a>) is a subset of OpenGL
specifically designed for devices like smartphones and tablet
computers. The OpenGL ES 2.0 specification removes the old
fixed-function APIs. What’s significant about this is that WebGL is
based on OpenGL ES 2.0. If the context a discussion is WebGL, you’re
guaranteed to not be talking about an outdated API. This indicator has
been a really handy way to filter out a lot of bad information.</p>

<p>In fact, I think <strong>the <a href="http://www.khronos.org/registry/webgl/specs/1.0/">WebGL specification</a> is probably the
best documentation root for exploring OpenGL</strong>. None of the outdated
functions are listed, most of the descriptions are written in plain
English, and they all link out to the official documentation if
clarification or elaboration is needed. As I was learning WebGL it was
easy to jump around this document to find what I needed.</p>

<p>This is also a reason to completely avoid spending time learning the
fixed-function pipeline. It’s incompatible with WebGL and many modern
platforms. Learning it would be about as useful as learning Latin when
your goal is to communicate with people from other parts of the world.</p>

<h3 id="the-fundamentals">The Fundamentals</h3>

<p>Now that WebGL allowed me to focus on the relevant parts of OpenGL, I
was able to spend effort into figuring out the important stuff that
the tutorials skip over. You see, even the tutorials that are using
the right pipeline still do a poor job. They skip over the
fundamentals and dive right into 3D graphics. This is a mistake.</p>

<p>I’m a firm believer that
<a href="http://www.skorks.com/2010/04/on-the-value-of-fundamentals-in-software-development/">mastery lies in having a solid grip on the fundamentals</a>.
The programmable pipeline has little built-in support for 3D graphics.
This is because <strong>OpenGL is at its essence <a href="http://www.html5rocks.com/en/tutorials/webgl/webgl_fundamentals/">a 2D API</a></strong>. The
vertex shader accepts <em>something</em> as input and it produces 2D vertices
in device coordinates (-1 to 1) as output. Projecting this <em>something</em>
to 2D is functionality you have to do yourself, because OpenGL won’t
be doing it for you. Realizing this one fact was what <em>really</em> made
everything click for me.</p>

<p><img src="/img/diagram/device-coordinates.png" alt="" /></p>

<p>Many of the tutorials try to handwave this part. “Just use this
library and this boilerplate so you can ignore this part,” they say,
quickly moving on to spinning a cube. This is sort of like using an
IDE for programming and having no idea how a build system works. This
works if you’re in a hurry to accomplish a specific task, but it’s no
way to achieve mastery.</p>

<p>More so, for me the step being skipped <em>is perhaps the most
interesting part of it all</em>! For example, after getting a handle on
how things worked — without copy-pasting any boilerplate around — I
ported <a href="/blog/2012/06/03/">my OpenCL 3D perlin noise generator</a> to GLSL.</p>

<ul>
  <li><a href="/perlin-noise/">/perlin-noise/</a>
(<a href="https://github.com/skeeto/perlin-noise/tree/master/webgl">source</a>)</li>
</ul>

<p><img src="/img/noise/octave-perlin2d.png" alt="" /></p>

<p>Instead of saving off each frame as an image, this just displays it in
real-time. The CPU’s <em>only</em> job is to ask the GPU to render a new
frame at a regular interval. Other than this, it’s entirely idle. All
the computation is being done by the GPU, and at speeds far greater
than a CPU could achieve.</p>

<p>Side note: you may notice some patterns in the noise. This is because,
as of this writing, I’m still working out decent a random number
generation in the fragment shader.</p>

<p>If your computer is struggling to display that page it’s because the
WebGL context is demanding more from your GPU than it can deliver. All
this GPU power is being put to use for something other than 3D
graphics! I think that’s far more interesting than a spinning 3D cube.</p>

<h3 id="spinning-3d-sphere">Spinning 3D Sphere</h3>

<p>However, speaking of 3D cubes, this sort of thing was actually my very
first WebGL project. To demonstrate the
<a href="/blog/2012/02/08/">biased-random-point-on-a-sphere</a> thing to a co-worker (outside
of work), I wrote a 3D HTML5 canvas plotter. I didn’t know WebGL yet.</p>

<ul>
  <li><a href="/sphere-js/?webgl=false">HTML5 Canvas 2D version</a>
(<a href="https://github.com/skeeto/sphere-js">source</a>) (ignore the warning)</li>
</ul>

<p>On a typical computer this can only handle about 4,000 points before
the framerate drops. In my effort to finally learn WebGL, I ported the
display to WebGL and GLSL. Remember that you have to bring your own 3D
projection to OpenGL? Since I had already worked all of that out for
the 2D canvas, this was just a straightforward port to GLSL. Except
for the colored axes, this looks identical to the 2D canvas version.</p>

<ul>
  <li><a href="/sphere-js/">WebGL version</a>
(a red warning means it’s not working right!)</li>
</ul>

<p><img src="/img/screenshot/sphere-js.png" alt="" /></p>

<p>This version can literally handle <em>millions</em> of points without
breaking a sweat. The difference is dramatic. Here’s 100,000 points in
each (any more points and it’s just a black sphere).</p>

<ul>
  <li><a href="/sphere-js/?n=100000">WebGL 100,000 points</a></li>
  <li><a href="/sphere-js/?n=100000&amp;webgl=false">Canvas 100,000 points</a></li>
</ul>

<h3 id="a-friendly-api">A Friendly API</h3>

<p>WebGL still three major advantages over other OpenGL bindings, all of
which make it a real joy to use.</p>

<h4 id="length-parameters">Length Parameters</h4>

<p>In C/C++ world, where the OpenGL specification lies, any function that
accepts an arbitrary-length buffer must also have an parameter for the
buffer’s size. Due to this, these functions tend to have a lot of
parameters! So in addition to OpenGL’s existing clunkiness there are
these length arguments to worry about.</p>

<p>Not so in WebGL! Since JavaScript is a type-safe language, the buffer
lengths are stored with the buffers themselves, so this parameter
completely disappears. This is also an advantage of Java’s lwjgl.</p>

<h4 id="resource-management">Resource Management</h4>

<p>Any time a shader, program, buffer, etc. is created, resources are
claimed on the GPU. Long running programs need to manage these
properly, destroying them before losing the handle on them. Otherwise
it’s a GPU leak.</p>

<p>WebGL ties GPU resource management to JavaScript’s garbage collector.
If a buffer is created and then let go, the GPU’s associated resources
will be freed at the same time as the wrapper object in JavaScript.
This can still be done explicitly if tight management is needed, but
the GC fallback is there if it’s not done.</p>

<p>Because this is untrusted code interacting with the GPU, this part is
essential for security reasons. JavaScript programs can’t leak GPU
resources, even intentionally.</p>

<p>Unlike the buffer length advantage, lwjgl does not do this. You still
need to manage GPU resources manually in Java, just as you would C.</p>

<h4 id="live-interaction">Live Interaction</h4>

<p>Perhaps most significantly of all, I can
<a href="https://github.com/skeeto/skewer-mode">drive WebGL interactively with Skewer</a>. If I expose shader
initialization properly, I can even update the shaders while the
display running. Before WebGL, live OpenGL interaction is something
that could only be achieved with the Common Lisp OpenGL bindings (as
far as I know).</p>

<p>It’s <em>really</em> cool to be able to manipulate an OpenGL context from
Emacs.</p>

<h3 id="the-future">The Future</h3>

<p>I’m expecting to do a lot more with WebGL in the future. I’m <em>really</em>
keeping my eye out for an opportunity to combine it with
<a href="/blog/2013/01/26/">distributed web computing</a>, but using the GPU instead of the
CPU. If I find a problem that fits this infrastructure well, this
system may be the first of its kind: visit a web page and let it use
your GPU to help solve some distributed computing problem!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>JavaScript Fantasy Name Generator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/03/27/"/>
    <id>urn:uuid:8c5e22c4-f826-3b81-3f55-4ea60882620e</id>
    <updated>2013-03-27T00:00:00Z</updated>
    <category term="javascript"/><category term="game"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p>Also in preparation for <a href="/blog/2013/03/17/">my 7-day roguelike</a> I
rewrote the <a href="/blog/2009/01/04/">RingWorks fantasy name generator</a>
<a href="https://github.com/skeeto/fantasyname">in JavaScript</a>. It’s my third implementation of this generator
and this one is also the most mature <em>by far</em>.</p>

<p>Try it out by <a href="/fantasyname/">playing with the demo</a> (<a href="https://github.com/skeeto/fantasyname">GitHub</a>).</p>

<p>The first implementation was written in Perl. It worked by
interpreting the template string each time a name was to be generated.
This was incredibly slow, partly because of the needless re-parsing,
but mostly because the parser library I used had really poor
performance. It’s literally <em>millions</em> of times slower than this new
JavaScript implementation.</p>

<p>The <a href="/blog/2009/07/03/">second implementation</a> I did in Emacs Lisp. I didn’t
actually write a parser. Instead, an s-expression is walked and
interpreted for each name generation. Much faster, but I missed having
the template syntax.</p>

<p>The JavaScript implementation has a template <em>compiler</em>. There are
five primitive name generator prototypes — including strings
themselves, because anything with a toString() method can be a name
generator — which the compiler composes into a composite generator
following the template. The neatest part is that it’s an optimizing
compiler, using the smallest composition of generators possible. If a
template can only emit one possible pattern, the compiler will try to
return a string of exactly the one possible output.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typeof</span> <span class="nx">NameGen</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="dl">'</span><span class="s1">(foo) (bar)</span><span class="dl">'</span><span class="p">);</span>
<span class="c1">// =&gt; "string"</span>
</code></pre></div></div>

<p>Here’s the example usage I have in the documentation. On my junk
laptop it can generate a million names for this template in just under
a second.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">generator</span> <span class="o">=</span> <span class="nx">NameGen</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="dl">"</span><span class="s2">sV'i</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">generator</span><span class="p">.</span><span class="nx">toString</span><span class="p">();</span>  <span class="c1">// =&gt; "entheu'loaf"</span>
<span class="nx">generator</span><span class="p">.</span><span class="nx">toString</span><span class="p">();</span>  <span class="c1">// =&gt; "honi'munch"</span>
</code></pre></div></div>

<p>However, in this case there aren’t actually that many possible
outputs. How do I know? You can ask the generator about what it can
generate. Generators know quite a bit about themselves!</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">generator</span><span class="p">.</span><span class="nx">combinations</span><span class="p">();</span>
<span class="c1">// =&gt; 118910</span>

<span class="kd">var</span> <span class="nx">foobar</span> <span class="o">=</span> <span class="nx">NameGen</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="dl">'</span><span class="s1">(foo|bar)</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">foobar</span><span class="p">.</span><span class="nx">combinations</span><span class="p">();</span>
<span class="c1">// =&gt; 2</span>
<span class="nx">foobar</span><span class="p">.</span><span class="nx">enumerate</span><span class="p">();</span> <span class="c1">// List all possible outputs.</span>
<span class="c1">// =&gt; ["foo", "bar"]</span>
</code></pre></div></div>

<p>After some experience using it in Disc RL I found that it would be
<em>really</em> useful to be mark parts of the output to the capitalized.
Without this, capitalization is awkwardly separate metadata. So I
extended the original syntax to do this. Prefix anything with an
exclamation point and it gets capitalized in the output.</p>

<p>For example, here’s a template I find amusing. There are 5,113,130
possible output names.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>!BsV (the) !i
</code></pre></div></div>

<p>Here are some of the interesting output names.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Quisey the Dork
Cunda the Snark
Strisia the Numb
Pustie the Dolt
Blhatau the Clown
</code></pre></div></div>

<p>Mostly as an exercise, I also added tilde syntax, which reverses the
component that follows it. So <code class="language-plaintext highlighter-rouge">~(foobar)</code> will always emit <code class="language-plaintext highlighter-rouge">raboof</code>. I
don’t think this is particularly useful but having it opens the door
for other similar syntax extensions.</p>

<p>If you’re making a procedurally generated game in JavaScript, consider
using this library for name generation!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>7DRL 2013 Complete</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/03/17/"/>
    <id>urn:uuid:bb22fefb-c0c9-3a83-88fe-70179a7975f8</id>
    <updated>2013-03-17T00:00:00Z</updated>
    <category term="javascript"/><category term="interactive"/><category term="game"/>
    <content type="html">
      <![CDATA[<p>As I mentioned previously, I participated in this year’s Seven Day
Roguelike. It was my first time doing so. I managed to complete my
roguelike within the allotted time period, though lacking many
features I had originally planned. It’s an HTML5 game run entirely
within the browser. You can play the final version here,</p>

<ul>
  <li><a href="/disc-rl/">Disc RL</a></li>
</ul>

<p><a href="/img/screenshot/disc-rl.png"><img src="/img/screenshot/disc-rl-thumb.png" alt="" /></a></p>

<h3 id="what-went-right">What Went Right</h3>

<h4 id="display">Display</h4>

<p>The first thing I did was create a functioning graphical display. The
goal was to get it into a playable state as soon as possible so that I
could try ideas out as I implemented them. This was especially true
because I was doing <a href="/blog/2012/10/31/">live development</a>, adding
features to the game as it was running.</p>

<p>The display is made up of 225 (15x15) absolutely positioned <code class="language-plaintext highlighter-rouge">div</code>
elements. The content of each div is determined by its dynamically-set
CSS classes. Generally, the type of map tile is one class (background)
and the type of monster in that tile is another class (foreground). If
the game had items, these would also be displayed using classes.</p>

<p>While I could use <code class="language-plaintext highlighter-rouge">background-image</code> to fill these <code class="language-plaintext highlighter-rouge">div</code>s with images,
I decided when I started I would use no images whatsoever. Everything
in the game would be drawn using CSS.</p>

<p>After the display was working, I was able spend the rest of the time
working entirely in game coordinates, completely forgetting all about
screen coordinates. This made everything else so much simpler.</p>

<h4 id="saving">Saving</h4>

<p>Early in the week I got <a href="/blog/2013/03/11/">my save system working</a>.
After ditching a library that wasn’t working, it only took me about 45
minutes to do this from scratch. Plugging my main data structure into
it <em>just worked</em>. I did end up accidentally violating my assumption
about non-circularity. When adding multiple dungeon levels, these
levels would refer to each other, leading to circularity. I got around
that with a small hack of referring to other dungeons by name, an
indirect reference.</p>

<p>From what I’ve seen of other HTML5 roguelikes, saving state seems to
be a unique feature of my roguelike. I don’t see anyone else doing it.</p>

<h4 id="combat">Combat</h4>

<p>I think the final combat mechanics are fairly interesting. It’s all
based on <em>identity discs</em> and <em>corruption</em> (see the help in the game
for more information). There are two kinds of attacks that any
creature in the game can do: melee (bash with your disc) and ranged
(throw your disc). I created three different AIs to make use of these,
bringing in four different monster classes. Note, I consider these
game spoilers.</p>

<ul>
  <li>
    <p><strong>Simple</strong>: Middle-of-the-road on abilities, these guys run up
and try to hit you. No ranged attacks. The strategy is to attack
them with ranged as they close in, then melee them down. They’re
easy and these are the monsters you see in the beginning of the
game.</p>
  </li>
  <li>
    <p><strong>Brute</strong>: These guys have high health and damage (high
strength), but are slow moving (low dexterity). They try to run up
to you and bash you (same AI as “simple” monsters). The strategy
for them is to “kite” them, keeping your distance and hitting them
with ranged attacks, especially when they’re standing on
corruption.</p>
  </li>
  <li>
    <p><strong>Archer</strong>: Archers are the opposite of brutes: low health, high
ranged damage, and high speed. They chase you down and perform
ranged attacks no matter what. The strategy for dealing with them
is to break line-of-sight and wait. They’ll run up and around the
corner where you can melee attack them. Since they’ll continue to
use ranged attacks this leaves them wide open for melee
attacks. This is due to a mechanic that monsters, including the
player, can’t block attacks with their disc for one turn after they
throw it.</p>
  </li>
  <li>
    <p><strong>Skirmisher</strong>: This is a hybrid of brutes and archers and are
the most difficult. They have high dexterity, sometimes also high
strength, and use the appropriate attack for the situation. At
range they use ranged attacks, they’ll try to run away from you if
you get close, and if you do manage to get up close they’ll switch
to melee. Dealing with these guys depends a lot on the terrain
around you. Remember to take advantage of corruption when dealing
with them.</p>
  </li>
</ul>

<p>The eight final, identical bosses of the game have a slightly custom
AI, making them sort of like another class on their own. They’re the
“T” monsters in the screenshot above. I won’t describe it here because
I still want there to be some sort of secret. :-)</p>

<h4 id="corruption">Corruption</h4>

<p>Corruption was actually something I came up with late in the week, and
I’m happy with out it turned out. It makes for more interesting combat
tactics (see above) and I think it really adds some flavor.
Occasionally when you move over corrupted (green) tiles, you will
notice the game’s interface being scrambled for a turn.</p>

<h4 id="rotjs">rot.js</h4>

<p>I ended up using <a href="http://ondras.github.com/rot.js/hp/">rot.js</a> to handle field-of-view calculations,
path finding, and dungeon generation. These are all time consuming to
write and there really is no reason to implement your own for the
first two. I would have liked to do my own dungeon generation, but
rot.js was just so convenient for this that I decided to skip it. The
downside is that my dungeons will look like the dungeons from other
games.</p>

<p>Path finding was critical not only for monsters but also automatic
exploration. Even though it’s quirky, I’m really happy with how it
turned out. Personally, one of the most tiring parts of some
roguelikes is just manually navigating around empty complex
corridors. Good gameplay is all about a long series of interesting
decisions. Choosing right or left when exploring a map is generally
not an interesting decision, because there’s really no differentiating
them. Auto-exploration is a useful way to quickly get the player to
the next interesting decision. In my game, you generally only need to
press a directional navigation key when you’re engaged in combat.</p>

<h4 id="help-screen">Help Screen</h4>

<p>I’m really happy with how my overlay screens turned out, especially
the keyboard styling. I’m talking about the help screen, control
screen, and end-game screen. Since this is the <em>very</em> first thing
presented to the user, I felt it was important to invest time into
it. First impressions are everything.</p>

<h3 id="what-went-wrong">What Went Wrong</h3>

<p>The game is smaller than I originally planned. Monsters have unused
stats and slots on them not displayed by the interface. A look at the
code will reveal a lot of openings I left for functionality not
actually present. I originally intended for 10 dungeon levels, but
lacking a variety of monster AIs, which are time consuming to write, I
ended up with 6 dungeon levels.</p>

<h4 id="user-interfaces">User Interfaces</h4>

<p>My original healing mechanic was going to be health potions (under a
different, thematic name), with no heal-over-time. As I was nearing
the end of the week I still hadn’t implemented items, so this got
scrapped for a heal-on-level mechanic and an easing of the leveling
curve. Everything was in place for implementing items, and therefore
healing potions, except for the user interface. This was a common
situation: the UI being the hardest part of any new feature. Writing
an inventory management interface was going to take too much time, so
I dropped it.</p>

<p>Also dumped due to the lack of time to implement an interface for it
was some kind of spell mechanic. Towards the end I did squeeze in
ranged attacks, but this was out of necessity of real combat
mechanics.</p>

<p>There’s no race (Program, User, etc) and class (melee, ranged, etc.)
selection, not even character name selection. This is really just
another user interface thing.</p>

<p>There are no stores/merchants because these are probably the hardest
to implement interfaces of all!</p>

<h4 id="game-balance">Game Balance</h4>

<p>I’m also not entirely satisfied with the balance. The early game is
too easy and the late game is probably too hard. The difficulty ramps
up quickly in the middle. Fortunately this is probably better than the
reverse: the early game is too hard and the end game is too
easy. Early difficulty won’t be what’s scaring off anyone trying out
the game — instead, that would be boredom! Generally if you find a
level too difficult, you need to retreat to the previous level and
grind out a few more levels. This turns out to be not very
interesting, so there needs to be less of it.</p>

<p>Fixing this would take a lot more play-testing — also very
time-consuming. At the end of the week I probably spent around six
hours just playing legitimately (i.e. not cheating), and having fun
doing so, but that still wasn’t enough. The very end of the game with
the final bosses is quite challenging, so to test that part in a
reasonable time-frame I had to cheat a bit.</p>

<h4 id="code-namespacing">Code Namespacing</h4>

<p>My namespaces are messy. This was the largest freestanding (i.e. not
AngularJS) JavaScript application I’ve done so far, and it’s the first
one where I could really start to appreciate tighter namespace
management. This lead to more coupling between different systems than
was strictly necessary.</p>

<p>I still want to avoid the classical module pattern of wrapping
everything in a big closure. That’s incompatible with Skewer for one,
and it would have also been incompatible with my prototype-preserving
storage system. I just need to be more disciplined in containing
sprawl.</p>

<p>However, in the end none of this really mattered one bit. No one is
maintaining this code and no one will ever read it except me. At the
end of the week it’s <em>much</em> better to have sloppy code and a working
game than a clean codebase and only half of a game.</p>

<h4 id="css-animations">CSS Animations</h4>

<p>Along with my no-images philosophy, I was intending to exploit CSS
animations to make the map look really interesting. I wanted the
glowing walls to pulsate with energy. Unfortunately adding a removing
classes causes these animations to reset if reflow is allowed to occur
— sometimes. The exact behavior is browser-dependent. All the
individual tile animations would get out of sync and everything would
look terrible. There is intentionally little control over this
behavior, for optimization purposes, so I couldn’t fix it.</p>

<h3 id="next-year">Next Year?</h3>

<p>Will I participate again next year? Maybe. I’m really happy with the
outcome this year, but I’m afraid doing the same thing again next year
will feel tedious. But maybe I’ll change my mind after taking a year
off from this! I wasn’t intending on participating this year, until
<a href="http://50ply.com/">Brian</a> twisted my arm into it. See, peer pressure isn’t always
a bad thing!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Flu Trends Timeline</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/13/"/>
    <id>urn:uuid:795da82b-feaf-39a2-cd67-527e21cc77d8</id>
    <updated>2013-01-13T00:00:00Z</updated>
    <category term="javascript"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p>This past week I came across this CSV-formatted data from Google. It’s
search engine trends for searches about the flu for different parts of
the US.</p>

<ul>
  <li><a href="http://www.google.org/flutrends/us/data.txt">http://www.google.org/flutrends/us/data.txt</a></li>
</ul>

<p>I thought it would be interesting to display this data on a map with a
slider along the bottom to select the date. Here’s the result of
spending two hours doing just that. I’m really happy with how it
turned out, and, further, I picked up a few new tricks from the
process.</p>

<ul>
  <li><a href="/flu-trends-timeline">Flu Trends Timeline</a> (<a href="https://github.com/skeeto/flu-trends-timeline">GitHub</a>)</li>
</ul>

<p><a href="/flu-trends-timeline"><img src="/img/screenshot/flu-thumb.png" alt="" /></a></p>

<p>You probably noticed there was a spinner when you first opened the
page. This is because it’s asynchronously fetching the latest
<code class="language-plaintext highlighter-rouge">data.txt</code> from Google. However, since it’s a
<a href="http://en.wikipedia.org/wiki/Same_origin_policy">cross-origin</a> request, and I don’t
<a href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">control the server headers</a> (static hosting), it’s using
<a href="http://anyorigin.com/">AnyOrigin.com</a> to translate it into a <a href="http://en.wikipedia.org/wiki/JSONP">JSONP</a>
request. That’s a really handy service!</p>

<p>To parse the CSV format, I’m using <a href="http://code.google.com/p/jquery-csv/">jquery-csv</a>. I
wouldn’t mention it except that it has a really cool feature I haven’t
seen in any other CSV parser: instead of reading the text into a
two-dimensional array — which would need to be “parsed” further — it
can read in each row as a object, using the CSV header line as the
properties. This is the <code class="language-plaintext highlighter-rouge">toObjects()</code> function. It makes it feel like
reading straightforward JSON. For example,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>name,color,weight
apple,red,1.2
banana,yellow,1.6
orange,orange,0.9
</code></pre></div></div>

<p>Will be parsed into this in JavaScript structure,</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[{</span><span class="na">name</span><span class="p">:</span><span class="dl">"</span><span class="s2">apple</span><span class="dl">"</span><span class="p">,</span>  <span class="na">color</span><span class="p">:</span><span class="dl">"</span><span class="s2">red</span><span class="dl">"</span><span class="p">,</span>    <span class="na">weight</span><span class="p">:</span><span class="dl">"</span><span class="s2">1.2</span><span class="dl">"</span><span class="p">},</span>
 <span class="p">{</span><span class="na">name</span><span class="p">:</span><span class="dl">"</span><span class="s2">banana</span><span class="dl">"</span><span class="p">,</span> <span class="na">color</span><span class="p">:</span><span class="dl">"</span><span class="s2">yellow</span><span class="dl">"</span><span class="p">,</span> <span class="na">weight</span><span class="p">:</span><span class="dl">"</span><span class="s2">1.6</span><span class="dl">"</span><span class="p">},</span>
 <span class="p">{</span><span class="na">name</span><span class="p">:</span><span class="dl">"</span><span class="s2">orange</span><span class="dl">"</span><span class="p">,</span> <span class="na">color</span><span class="p">:</span><span class="dl">"</span><span class="s2">orange</span><span class="dl">"</span><span class="p">,</span> <span class="na">weight</span><span class="p">:</span><span class="dl">"</span><span class="s2">0.9</span><span class="dl">"</span><span class="p">}]</span>
</code></pre></div></div>

<p>With the flu data, it means each returned object is a single date
snapshot, just what I need. The only data-massaging I had to do was
mapping over each object to convert the date string into a proper Date
object.</p>

<p>Using two neat tricks I’ve got the latest data parsed into my desired
data structure. Next up is displaying a map. At first I wasn’t sure
how I to do this cleanly but then I remembered an old DailyProgrammer
problem: <a href="http://redd.it/yj38u">#89, Coloring the United States of America</a>. SVG maps
tend to contain metadata describing what shape is what. In this case,
each state shape’s <code class="language-plaintext highlighter-rouge">id</code> attribute has the two-letter state code. Even
more, SVG plays very, very well with JavaScript. It can be manipulated
as part of the DOM, using the same API, including jQuery. It also uses
CSS for styling.</p>

<p>The tricky part is actually accessing the SVG’s document root. To do
this, it can’t be included as an <code class="language-plaintext highlighter-rouge">img</code> tag. Otherwise it’s an opaque
raster image as far as JavaScript is concerned. It either needs to be
embedded into the HTML — a dirty mix of languages that should be
avoided — or accessed through an asynchronous request. Accessing
remote XML was the original purpose of asynchronous browser requests,
after all (i.e. the poorly-named <strong>XML</strong>HttpRequest object). I can host
this SVG from my own server, so this isn’t an issue like the CSV data.</p>

<p>HTML doesn’t have a slider input, unfortunately, so for the slider I’m
using the <a href="http://jqueryui.com/slider/">jQuery UI Slider</a>. I’m not terribly impressed
with it but it gets the job done. Even before I had the slider
connected, I could change the display date on the fly from Emacs using
<a href="/blog/2012/10/31/">Skewer</a>.</p>

<p>In regard my initial expectations, this project was surprisingly
<em>very</em> well suited for HTML and JavaScript. Being able to manipulate
SVG on the fly is really powerful and I doubt there’s an easier
platform on which to do it than the browser.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Cartoon Liquid Simulation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/02/03/"/>
    <id>urn:uuid:3819c303-f785-3d90-7c85-af2ca32b7ee4</id>
    <updated>2012-02-03T00:00:00Z</updated>
    <category term="interactive"/><category term="java"/><category term="math"/><category term="media"/><category term="video"/>
    <content type="html">
      <![CDATA[<p><strong>Update June 2013</strong>: This program has been <a href="/blog/2012/02/03/">ported to WebGL</a>!!!</p>

<p>The other day I came across this neat visual trick:
<a href="http://www.patrickmatte.com/stuff/physicsLiquid/">How to simulate liquid</a> (Flash). It’s a really simple way to
simulate some natural-looking liquid.</p>

<ul>
  <li>Perform a physics simulation of a number of circular particles.</li>
  <li>Render this simulation in high contrast.</li>
  <li>Gaussian blur the rendering.</li>
  <li>Threshold the blur.</li>
</ul>

<p><img src="/img/liquid/liquid-thumb.png" alt="" /></p>

<p>I [made my own version][fun] in Java, using <a href="http://jbox2d.org/">JBox2D</a> for the
physics simulation.</p>

<ul>
  <li><a href="https://github.com/skeeto/fun-liquid">https://github.com/skeeto/fun-liquid</a></li>
</ul>

<p>For those of you who don’t want to run a Java applet, here’s a video
demonstration. Gravity is reversed every few seconds, causing the
liquid to slosh up and down over and over. The two triangles on the
sides help mix things up a bit. The video flips through the different
components of the animation.</p>

<video src="https://s3.amazonaws.com/nullprogram/liquid/liquid-overview.webm" poster="https://s3.amazonaws.com/nullprogram/liquid/liquid-poster.png" controls="controls" width="250" height="350">
</video>

<p>It’s not a perfect liquid simulation. The surface never settles down,
so the liquid is lumpy, like curdled milk. There’s also a lack of
cohesion, since JBox2D doesn’t provide cohesion directly. However, I
think I could implement cohesion on my own by writing a custom
contact.</p>

<p>JBox2D is a really nice, easy-to-use 2D physics library. I only had to
read the first two chapters of the <a href="http://box2d.org/">Box2D</a> manual. Everything
else can be figured out through the JBox2D Javadocs. It’s also
available from the Maven repository, which is the reason I initially
selected it. My only complaint so far is that the API doesn’t really
follow best practice, but that’s probably because it follows the Box2D
C++ API so closely.</p>

<p>I’m excited about JBox2D and I plan on using it again for some future
project ideas. Maybe even a game.</p>

<p>The most computationally intensive part of the process <em>isn’t</em> the
physics. That’s really quite cheap. It’s actually blurring, by far.
Blurring involves <a href="/blog/2008/02/22/">convolving a kernel</a> over the image —
O(n^2) time. The graphics card would be ideal for that step, probably
eliminating it as a bottleneck, but it’s unavailable to pure Java. I
could have <a href="/blog/2011/11/06/">pulled in lwjgl</a>, but I wanted to keep it simple,
so that it could be turned into a safe applet.</p>

<p>As a result, it may not run smoothly on computers that are more than a
couple of years old. I’ve been trying to come up with a cheaper
alternative, such as rendering a transparent halo around each ball,
but haven’t found anything yet. Even with that fix, thresholding would
probably be the next bottleneck — something else the graphics card
would be really good at.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Feedback Loop Applet</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/05/01/"/>
    <id>urn:uuid:953638ec-fa1e-36a4-397e-2aeb435aebbd</id>
    <updated>2011-05-01T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p><em>Update June 2014</em>: This <a href="/blog/2014/06/21/">was ported to WebGL</a> and greatly
improved.</p>

<ul>
  <li><a href="https://github.com/skeeto/Feedback.git">https://github.com/skeeto/Feedback.git</a></li>
</ul>

<p>I was watching the BBC’s <em>The Secret Life of Chaos</em>, which is a very
interesting documentary about chaos, <a href="/blog/2007/10/01/">fractals</a>, and emergent
behavior. There is <a href="magnet:?xt=urn:btih:80e59413ca2b46e74f4a7572366a4a7de9b3e096">a part where emergent behavior is demonstrated using
a video camera feedback loop</a> (35 minutes in). A camera, pointed
at a projector screen, has it’s output projected onto that same screen.
A match is lit, moved around a bit, and removed from the camera’s
vision. At the center of the camera’s focus a pattern dances around for
awhile in an unpredictable pattern, as the pattern is fed back into
itself.</p>

<p>That’s the key to fractals and emergent behavior right there: a feedback
loop. Apply some simple rules to a feedback loop and you might have a
fractal on your hands. More examples,</p>

<ul>
  <li><a href="http://www.youtube.com/watch?v=Jj9pbs-jjis">How to make fractals without a computer</a></li>
  <li><a href="http://www.youtube.com/watch?v=xzJVbmqcj7k">video feedback experiment 4</a></li>
</ul>

<p>I was inspired to simulate this with software (and I <a href="http://devrand.org/view/emergentFeedback">passed that on to
Gavin too</a>). Take an image, rescale it, rotate it, compose it
with itself, repeat.</p>

<p>Here are some images I created with it.</p>

<p><img src="/img/feedback/dense-tunnel.png" alt="" />
<img src="/img/feedback/single-spiral.png" alt="" />
<img src="/img/feedback/sun2.png" alt="" />
<img src="/img/feedback/jagged-spiral.png" alt="" /></p>

<p>You can interact with the mouse — like the lit match. And that really is
a feedback loop. In this video, you can see the mouse hop travel down
through the iterations.</p>

<video src="/vid/feedback/hop.ogv" controls="controls" width="640" height="640">
</video>

<p>Unfortunately I didn’t seem to be able to achieve emergent behavior. The
image operators too aggressively blur out fine details way down in the
center. Bit that’s fine: it turned out more visually appealing than I
expected!</p>

<p>Here’s a gallery of image captures from the applet. To achieve some of
the effects try adjusting the rotation angle (press r or R) and scale
factor (press g or G) while running the app. A couple of these were made
using an experimental fractal-like image operator that can only be
turned on in the source code.</p>

<p class="grid"><a href="/img/feedback/fractal.png"><img src="/img/feedback/fractal-thumb.png" alt="" /></a>
<a href="/img/feedback/gimpy.png"><img src="/img/feedback/gimpy-thumb.png" alt="" /></a>
<a href="/img/feedback/green-spots.png"><img src="/img/feedback/green-spots-thumb.png" alt="" /></a>
<a href="/img/feedback/halo.png"><img src="/img/feedback/halo-thumb.png" alt="" /></a>
<a href="/img/feedback/orange-star.png"><img src="/img/feedback/orange-star-thumb.png" alt="" /></a>
<a href="/img/feedback/pink-spiral.png"><img src="/img/feedback/pink-spiral-thumb.png" alt="" /></a>
<a href="/img/feedback/spin-blur.png"><img src="/img/feedback/spin-blur-thumb.png" alt="" /></a>
<a href="/img/feedback/spin-spiral.png"><img src="/img/feedback/spin-spiral-thumb.png" alt="" /></a>
<a href="/img/feedback/spiral.png"><img src="/img/feedback/spiral-thumb.png" alt="" /></a>
<a href="/img/feedback/star.png"><img src="/img/feedback/star-thumb.png" alt="" /></a>
<a href="/img/feedback/sun.png"><img src="/img/feedback/sun-thumb.png" alt="" /></a>
<a href="/img/feedback/tunnel.png"><img src="/img/feedback/tunnel-thumb.png" alt="" /></a>
<a href="/img/feedback/gear.png"><img src="/img/feedback/gear-thumb.png" alt="" /></a></p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Sudoku Applet</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/20/"/>
    <id>urn:uuid:8451de1d-f0c4-3001-2e5a-5720aaa17ed4</id>
    <updated>2010-10-20T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/><category term="game"/>
    <content type="html">
      <![CDATA[<!-- 20 October 2010 -->
<p>
<img src="/img/sudoku/small.png" alt="" class="right"/>

Over the last two evenings I created this, a Sudoku Java applet.
</p>
<pre>
git clone <a href="https://github.com/skeeto/Sudoku">git://github.com/skeeto/Sudoku.git</a>
</pre>
<p>

The hardest part was creating and implementing the algorithm for
generating new Sudokus. The first step to writing any Sudoku generator
is to <a href="/blog/2008/07/20/">write a Sudoku solver</a>. Use it to
solve an empty board in a random order, then back off while
maintaining a single solution. For a proper Sudoku, this has to be
done symmetrically leaving no more than 32 givens.
</p>
<p>
I didn't work out a great way to determine the difficulty of a
particular puzzle. The proper way would probably be to solve it a few
different ways and measure the number of steps taken. Right now I'm
controlling difficulty by adjusting the number of givens: 24 (hard),
28 (medium), and 32 (easy). Harder puzzles take longer to generate
because the search-space is less dense, due to the strict constraints.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Java Applets Demo Page</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/18/"/>
    <id>urn:uuid:0caa6106-7605-3b67-59c2-215e415f062f</id>
    <updated>2010-10-18T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/><category term="game"/>
    <content type="html">
      <![CDATA[<!-- 18 October 2010 -->
<p class="abstract">
  Update February 2013: Java Applets are a dead technology and using a
  Java web browser plugin is simply much too insecure. I strongly
  recommend against it. This applet page has since been moved to a
  more generally-named URL.
</p>

<p>
Because the two projects I announced over the weekend can be used as
Java applets, I went back and upgraded <a href="/blog/2007/10/08/">
two older</a> <a href="/blog/2009/12/13/">projects</a> so that they
could also behave as interactive applets. Now that I have several
applets to show off I've collected them together onto a "Java Applet
Demos" page.
</p>
<p class="center">
  <a href="/toys/">
    <b>Java Applet Demos</b><br/>
    <img src="/img/applets/applets.png" alt=""/>
  </a>
</p>
<p>
I intend on expanding this in the near future with improvements
(Michael is unhappy with my integration function on the water wheel,
for one) and new applets <a href="/blog/2010/10/15/">while I'm on a
roll</a>. Swing is my oyster.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Lorenz Chaotic Water Wheel Applet</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/16/"/>
    <id>urn:uuid:dee2a554-b81f-3d55-85dd-e0bf9a220e51</id>
    <updated>2010-10-16T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p><strong>Update Feburary 2018</strong>: This project <a href="https://github.com/skeeto/waterwheel">has gotten a huge facelift</a>.</p>

<p>Today’s project is a Java applet that simulates and displays a
<a href="http://en.wikipedia.org/wiki/Edward_Norton_Lorenz">Lorenz</a> water wheel. Freely-swinging buckets with small holes
in the bottom are arranged around the outside of a loose wheel. Water
flows into the bucket at the top of the wheel, filling it up. As it gets
heavier it pulls down on the wheel, spinning it.</p>

<p><img src="/img/wheel/waterwheel.png" alt="" /></p>

<p>Source: <a href="https://github.com/skeeto/ChaosWheel">https://github.com/skeeto/ChaosWheel</a></p>

<p>You can click with your mouse to adjust the simulation. If you run it
standalone (<code class="language-plaintext highlighter-rouge">java -jar</code>) it will allow you to give it the number of
buckets you want to use as a command-line argument. It’s based on some
Octave/Matlab code written by a friend of mine, Michael Abraham. Those
environments are so slow, though, that they couldn’t do it in real time
like this.</p>

<p>This simulation is <a href="http://en.wikipedia.org/wiki/Chaos_theor)">chaotic</a>: even though the behavior of the
system is deterministic it is <em>highly</em> sensitive to initial conditions.
The animation you see above is unique: no one saw this particular
variation before, nor will anyone again. If you refresh the page you’ll
be given a new wheel with unique initial conditions (well, one out of
the 2\^128 possible starting conditions, since this is running inside of
a digital computer).</p>

<p><img src="/img/wheel/butterfly.png" alt="" /></p>

<p>On the state space plot you should be able to see the state orbiting two
<a href="http://en.wikipedia.org/wiki/Lorenz_attractor">attractors</a>. It’s the classic butterfly image that gives the
phenomenon its name. The state space plot will look smoother with more
buckets, being perfectly smooth at an infinite amount of buckets. Your
mouse cannot possibly survive that much clicking, so don’t try it.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Game of Life in Java</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/12/13/"/>
    <id>urn:uuid:dfc591b0-65ed-39d6-80b2-57a8991c2a0d</id>
    <updated>2009-12-13T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<!-- 13 December 2009 -->
<p>
Sources:
</p>
<pre>
git clone <a href="https://github.com/skeeto/GameOfLife">git://github.com/skeeto/GameOfLife.git</a>
</pre>
<p>
<img src="/img/gol/game-of-life.png" width="250"
     alt="" title="Screenshot of this thing" class="right"/>
</p>

<p>
Since I recently got back into Java recently, I threw together this
little Game of Life implementation in Java. It looks a lot like my <a
href="/blog/2007/10/08/">maze generator/solver</a> on the inside,
reflecting the way I think about these things. Gavin wrote a <a
href="http://devrand.org/show_item.html?item=98&amp;page=Project">
competing version of the game</a> in <a
href="http://processing.org/">Processing</a> which we were partially
discussing the other night, so I made my own.
</p>
<p>
The individual cells are actually objects themselves, so you could
inherit the abstract Cell class and drop in your own rules. I bet you
could even write a Cell that does the <a href="/blog/2007/11/06/">
iterated prisoner's dilemma cellular automata</a>. The Cell objects
are wired together into a sort of mesh network. Instead of growing it
wraps around on the sides.
</p>
<p>
It takes up to four arguments right now, with three types of cells,
<code>basic</code>, implementing the basic Conway's Game of Life,
<code>growth</code>, which is a cell that matures over time, and
<code>random</code> which mixes both types together (seen in the
screenshot). The arguments work as follows,
</p>
<pre>
java -jar GoL.jar [&lt;cell type&gt; [&lt;width&gt; &lt;height&gt; [&lt;cell pixel width&gt;]]]
</pre>
<p>
I may look into extending this to do some things beyond simple
cellular automata.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Java Animated Maze Generator and Solver</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2007/10/08/"/>
    <id>urn:uuid:a0616bd6-6e00-38c5-c84c-12335f388d88</id>
    <updated>2007-10-08T00:00:00Z</updated>
    <category term="java"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p><img src="/img/screenshot/maze-thumb.png" alt="" /></p>

<ul>
  <li><a href="https://github.com/skeeto/RunMaze">https://github.com/skeeto/RunMaze</a></li>
</ul>

<p>In preparation for showing off the maze-navigating robot I made last
week, I give you this program I wrote last year.</p>

<p>When I started to learn Java, I wrote this little program for
practice. It features a GUI and multi-threading. It generates a maze
and then slowly solves the maze so you can watch it go. I wrote this
with the GNU project’s Java implementation <span style="text-decoration: line-through;">(never used the official Sun
one). The Makefile is therefore set up for <code>gcj</code> and
<code>gij</code>. The Makefile can build the byte-compiled
<code>.jar</code> file as well as a natively compiled version.</span>
I have updated it to use Ant, and I also use Sun’s OpenJDK these days.</p>

<p>The maze generation algorithm is what I believe to be called the
“straw man” algorithm: the maze starts as a matrix of individual
cells. Break down the walls between two cells that aren’t already
connected and move into it. Choose another wall at random and
repeat. If there are no walls left to break down, go back a
step. Lather, rinse, repeat. If you are back in the starting cell with
no more walls to break down, you are done.</p>

<p>Solving the maze is done in a similar way to generation. Go forward,
right, or left if you can. If not, go back to the previous cell. This
is the same algorithm I employed in the maze-navigating robot I built,
which has kept me pretty busy. I will post pictures of it later.</p>

<p>You can specify the maze height, width, and cell pixel size on the
command line when you run it. This runs with the defaults,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar RunMaze.jar
</code></pre></div></div>

<p>Or if you grabbed the source,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ant run
</code></pre></div></div>

<p>And, for a tiny 100 by 100 maze,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar RunMaze.jar 100 100 5
</code></pre></div></div>

<p>where,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar RunMaze.jar &lt;width&gt; &lt;height&gt; &lt;cell-size&gt;
</code></pre></div></div>

<p>So, why learn Java? I still don’t like Java, but I had missed out on
an interesting research opportunity because I had zero Java
experience. I probably wouldn’t use it on my own for anything but
practice, as my own projects don’t really need to be super-portable.
Java is ugly, bulky, and slow, but I don’t want to miss any more
opportunities. Right after I was turned down because of my lack of
experience, I bought a Java textbook and read it cover to cover on
vacation. More importantly, I wrote simple little Java programs (like
the one presented here) as I learned core concepts.</p>

<p><em>Update 2009-07-07: I use Java all the time at work now. There is
really no escaping it. Except maybe with something like
<a href="http://clojure.org/">Clojure</a> …</em></p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
