<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged reddit at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/reddit/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/reddit/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:1cbcfbba-f648-40f5-92fa-de92dd9f262b</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Showerthoughts Fortune File</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/01/"/>
    <id>urn:uuid:0a266c4d-a224-3399-a851-848f71b47dc3</id>
    <updated>2016-12-01T23:58:15Z</updated>
    <category term="reddit"/><category term="linux"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I have created a <a href="https://en.wikipedia.org/wiki/Fortune_(Unix)"><code class="language-plaintext highlighter-rouge">fortune</code> file</a> for the all-time top 10,000
<a href="https://old.reddit.com/r/Showerthoughts/">/r/Showerthoughts</a> posts, as of October 2016. As a word of
warning: Many of these entries are adult humor and may not be
appropriate for your work computer. These fortunes would be
categorized as “offensive” (<code class="language-plaintext highlighter-rouge">fortune -o</code>).</p>

<p>Download: <a href="https://skeeto.s3.amazonaws.com/share/showerthoughts" class="download">showerthoughts</a> (1.3 MB)</p>

<p>The copyright status of this file is subject to each of its thousands
of authors. Since it’s not possible to contact many of these authors —
some may not even still live — it’s obviously never going to be under
an open source license (Creative Commons, etc.). Even more, some
quotes are probably from comedians and such, rather than by the
redditor who made the post. I distribute it only for fun.</p>

<h3 id="installation">Installation</h3>

<p>To install this into your <code class="language-plaintext highlighter-rouge">fortune</code> database, first process it with
<code class="language-plaintext highlighter-rouge">strfile</code> to create a random-access index, showerthoughts.dat, then
copy them to the directory with the rest.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ strfile showerthoughts
"showerthoughts.dat" created
There were 10000 strings
Longest string: 343 bytes
Shortest string: 39 bytes

$ cp showerthoughts* /usr/share/games/fortunes/
</code></pre></div></div>

<p>Alternatively, <code class="language-plaintext highlighter-rouge">fortune</code> can be told to use this file directly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ fortune showerthoughts
Not once in my life have I stepped into somebody's house and
thought, "I sure hope I get an apology for 'the mess'."
        ―AndItsDeepToo, Aug 2016
</code></pre></div></div>

<p>If you didn’t already know, <code class="language-plaintext highlighter-rouge">fortune</code> is an old unix utility that
displays a random quotation from a quotation database — a digital
<em>fortune cookie</em>. I use it as an interactive login shell greeting on
my <a href="http://www.hardkernel.com/main/products/prdt_info.php">ODROID-C2</a> server:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if </span><span class="nb">shopt</span> <span class="nt">-q</span> login_shell<span class="p">;</span> <span class="k">then
    </span>fortune ~/.fortunes
<span class="k">fi</span>
</code></pre></div></div>

<h3 id="how-was-it-made">How was it made?</h3>

<p>Fortunately I didn’t have to do something crazy like scrape reddit for
weeks on end. Instead, I downloaded <a href="http://files.pushshift.io/reddit/">the pushshift.io submission
archives</a>, which is currently around 70 GB compressed. Each file
contains one month’s worth of JSON data, one object per submission,
one submission per line, all compressed with bzip2.</p>

<p>Unlike so many other datasets, especially when it’s made up of
arbitrary inputs from millions of people, the format of the
/r/Showerthoughts posts is surprisingly very clean and requires
virtually no touching up. It’s some really fantastic data.</p>

<p>A nice feature of bzip2 is concatenating compressed files also
concatenates the uncompressed files. Additionally, it’s easy to
parallelize bzip2 compression and decompression, which gives it <a href="/blog/2009/03/16/">an
edge over xz</a>. I strongly recommend using <a href="http://lbzip2.org/">lbzip2</a> to
decompress this data, should you want to process it yourself.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat </span>RS_<span class="k">*</span>.bz2 | lbunzip2 <span class="o">&gt;</span> everything.json
</code></pre></div></div>

<p><a href="https://stedolan.github.io/jq/">jq</a> is my favorite command line tool for processing JSON (and
<a href="/blog/2016/09/15/">rendering fractals</a>). To filter all the /r/Showerthoughts posts,
it’s a simple <code class="language-plaintext highlighter-rouge">select</code> expression. Just mind the capitalization of the
subreddit’s name. The <code class="language-plaintext highlighter-rouge">-c</code> tells <code class="language-plaintext highlighter-rouge">jq</code> to keep it one per line.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat </span>RS_<span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts")'</span> <span class="se">\</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>However, you’ll quickly find that jq is the bottleneck, parsing all
that JSON. Your cores won’t be exploited by lbzip2 as they should. So
I throw <code class="language-plaintext highlighter-rouge">grep</code> in front to dramatically decrease the workload for
<code class="language-plaintext highlighter-rouge">jq</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    <span class="nb">grep</span> <span class="nt">-a</span> Showerthoughts | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts")'</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>This will let some extra things through, but it’s a superset. The <code class="language-plaintext highlighter-rouge">-a</code>
option is necessary because the data contains some null bytes. Without
it, <code class="language-plaintext highlighter-rouge">grep</code> switches into binary mode and breaks everything. This is
incredibly frustrating when you’ve already waited half an hour for
results.</p>

<p>To further reduce the workload further down the pipeline, I take
advantage of the fact that only four fields will be needed: <code class="language-plaintext highlighter-rouge">title</code>,
<code class="language-plaintext highlighter-rouge">score</code>, <code class="language-plaintext highlighter-rouge">author</code>, and <code class="language-plaintext highlighter-rouge">created_utc</code>. The rest can — and should, for
efficiency’s sake — be thrown away where it’s cheap to do so.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    <span class="nb">grep</span> <span class="nt">-a</span> Showerthoughts | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts") |
               {title, score, author, created_utc}'</span> <span class="se">\</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>This gathers all 1,199,499 submissions into a 185 MB JSON file (as of
this writing). Most of these submissions are terrible, so the next
step is narrowing it to the small set of good submissions and putting
them into the <code class="language-plaintext highlighter-rouge">fortune</code> database format.</p>

<p><strong>It turns out reddit already has a method for finding the best
submissions: a voting system.</strong> Just pick the highest scoring posts.
Through experimentation I arrived at 10,000 as the magic cut-off
number. After this the quality really starts to drop off. Over time
this should probably be scaled up with the total number of
submissions.</p>

<p>I did both steps at the same time using a bit of Emacs Lisp, which is
particularly well-suited to the task:</p>

<ul>
  <li><a href="https://github.com/skeeto/showerthoughts">https://github.com/skeeto/showerthoughts</a></li>
</ul>

<p>This Elisp program reads one JSON object at a time and sticks each
into a AVL tree sorted by score (descending), then timestamp
(ascending), then title (ascending). The AVL tree is limited to 10,000
items, with the lowest items being dropped. This was a lot faster than
the more obvious approach: collecting everything into a big list,
sorting it, and keeping the top 10,000 items.</p>

<h4 id="formatting">Formatting</h4>

<p>The most complicated part is actually paragraph wrapping the
submissions. Most are too long for a single line, and letting the
terminal hard wrap them is visually unpleasing. The submissions are
encoded in UTF-8, some with characters beyond simple ASCII. Proper
wrapping requires not just Unicode awareness, but also some degree of
Unicode <em>rendering</em>. The algorithm needs to recognize grapheme
clusters and know the size of the rendered text. This is not so
trivial! Most paragraph wrapping tools and libraries get this wrong,
some counting width by bytes, others counting width by codepoints.</p>

<p>Emacs’ <code class="language-plaintext highlighter-rouge">M-x fill-paragraph</code> knows how to do all these things — only
for a monospace font, which is all I needed — and I decided to
leverage it when generating the <code class="language-plaintext highlighter-rouge">fortune</code> file. Here’s an example that
paragraph-wraps a string:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">string-fill-paragraph</span> <span class="p">(</span><span class="nv">s</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="nv">s</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">fill-paragraph</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>For the file format, items are delimited by a <code class="language-plaintext highlighter-rouge">%</code> on a line by itself.
I put the wrapped content, followed by a <a href="http://www.fileformat.info/info/unicode/char/2015/index.htm">quotation dash</a>, the
author, and the date. A surprising number of these submissions have
date-sensitive content (“on this day X years ago”), so I found it was
important to include a date.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>April Fool's Day is the one day of the year when people critically
evaluate news articles before accepting them as true.
        ―kellenbrent, Apr 2015
%
Of all the bodily functions that could be contagious, thank god
it's the yawn.
        ―MKLV, Aug 2015
%
</code></pre></div></div>

<p>There’s the potential that a submission itself could end with a lone
<code class="language-plaintext highlighter-rouge">%</code> and, with a bit of bad luck, it happens to wrap that onto its own
line. Fortunately this hasn’t happened yet. But, now that I’ve
advertised it, someone could make such a submission, popular enough
for the top 10,000, with the intent to personally trip me up in a
future update. I accept this, though it’s unlikely, and it would be
fairly easy to work around if it happened.</p>

<p>The <code class="language-plaintext highlighter-rouge">strfile</code> program looks for the <code class="language-plaintext highlighter-rouge">%</code> delimiters and fills out a
table of file offsets. The header of the <code class="language-plaintext highlighter-rouge">.dat</code> file indicates the
number strings along with some other metadata. What follows is a table
of 32-bit file offsets.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">str_version</span><span class="p">;</span>  <span class="cm">/* version number */</span>
    <span class="kt">uint32_t</span> <span class="n">str_numstr</span><span class="p">;</span>   <span class="cm">/* # of strings in the file */</span>
    <span class="kt">uint32_t</span> <span class="n">str_longlen</span><span class="p">;</span>  <span class="cm">/* length of longest string */</span>
    <span class="kt">uint32_t</span> <span class="n">str_shortlen</span><span class="p">;</span> <span class="cm">/* shortest string length */</span>
    <span class="kt">uint32_t</span> <span class="n">str_flags</span><span class="p">;</span>    <span class="cm">/* bit field for flags */</span>
    <span class="kt">char</span> <span class="n">str_delim</span><span class="p">;</span>        <span class="cm">/* delimiting character */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note that the table doesn’t necessarily need to list the strings in
the same order as they appear in the original file. In fact, recent
versions of <code class="language-plaintext highlighter-rouge">strfile</code> can sort the strings by sorting the table, all
without touching the original file. Though none of this important to
<code class="language-plaintext highlighter-rouge">fortune</code>.</p>

<p>Now that you know how it all works, you can build your own <code class="language-plaintext highlighter-rouge">fortune</code>
file from your own inputs!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Reddit API Wrapper</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/16/"/>
    <id>urn:uuid:3362934d-9762-3f58-e05c-4d8b28175367</id>
    <updated>2013-12-16T23:27:23Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="reddit"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>A couple of months ago I wrote an Emacs Lisp wrapper for the
<a href="http://old.reddit.com/dev/api">reddit API</a>. I didn’t put it in MELPA,
not yet anyway. If anyone is finding it useful I’ll see about getting
that done. My intention was give it some exercise and testing before
putting it out there for people to use, locking down the API. You can
find it here,</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-reddit-api">https://github.com/skeeto/emacs-reddit-api</a></li>
</ul>

<p>Except for logging in, the library is agnostic about the actual API
endpoints themselves. It just knows how to translate between Elisp and
the reddit API protocol. This makes the library dead simple to use. I
had considered supporting <a href="http://blog.jenkster.com/2013/10/an-oauth2-in-emacs-example.html">OAuth2 authentication</a> rather than
password authentication, but reddit’s OAuth2 support is pretty rough
around the edges.</p>

<h3 id="library-usage">Library Usage</h3>

<p>The reddit API has two kinds of endpoints, GET and POST, so there are
really only three functions to concern yourself with.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-login</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-get</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-post</code></li>
</ul>

<p>And one variable,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-session</code></li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">reddit-login</code> function is really just a special case of
<code class="language-plaintext highlighter-rouge">reddit-post</code>. It returns a session value (cookie/modhash tuple) that
is used by the other two functions for authenticating the user. Just
as you get automatically with almost all Elisp data structures —
probably more so than <em>any</em> other popular programming language — it
can be serialized with the printer and reader, allowing a reddit
session to be maintained across Emacs sessions.</p>

<p>The return value of <code class="language-plaintext highlighter-rouge">reddit-login</code> generally doesn’t need to be
captured. It automatically sets the dynamic variable <code class="language-plaintext highlighter-rouge">reddit-session</code>,
which is what the other functions access for authentication. This can
be bound with <code class="language-plaintext highlighter-rouge">let</code> to other session values in order to switch between
different users.</p>

<p>Both <code class="language-plaintext highlighter-rouge">reddit-get</code> and <code class="language-plaintext highlighter-rouge">reddit-post</code> take an endpoint name and a list
of key-value pairs in the form of a property list (plist). (The
<code class="language-plaintext highlighter-rouge">api-type</code> key is automatically supplied.) They each return the JSON
response from the server in association list (alist) form. The actual
shape of this data matches the response from reddit, which,
unfortunately, is inconsistent and unspecified, so writing any sort of
program to operate on the API requires lots of trial and error. If the
API responded with an error, these functions signal a <code class="language-plaintext highlighter-rouge">reddit-error</code>.</p>

<p>Typical usage looks like so. Notice that values need not be only
strings; they just need to print to something reasonable.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Login first</span>
<span class="p">(</span><span class="nv">reddit-login</span> <span class="s">"your-username"</span> <span class="s">"your-password"</span><span class="p">)</span>

<span class="c1">;; Subscribe to a subreddit</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/subscribe"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:sr</span> <span class="s">"t5_2s49f"</span> <span class="ss">:action</span> <span class="nv">sub</span><span class="p">))</span>

<span class="c1">;; Post a comment</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:text</span> <span class="s">"Hello world."</span> <span class="ss">:thing_id</span> <span class="s">"t1_cd3ar7y"</span><span class="p">))</span>
</code></pre></div></div>

<p>For plists keys I considered automatically converting between dashes
and underscores so that the keywords could have Lisp-style names. But
the reddit API is inconsistent, using both, so there’s no correct way
to do this.</p>

<p>To further refine the API it might be worth defining a function for
each of the reddit endpoints, forming a facade for the wrapper
library, hiding way the plist arguments and complicated responses.
That would eliminate the trial and error of using the API.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">reddit-api-comment</span> <span class="p">(</span><span class="nv">parent</span> <span class="nv">comment</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nv">reddit-session</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">error</span> <span class="s">"Not logged in."</span><span class="p">)</span>
    <span class="c1">;; TODO: reduce the return value into a thing/struct</span>
    <span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:thing_id</span> <span class="nv">parent</span> <span class="ss">:text</span> <span class="nv">comment</span><span class="p">))))</span>
</code></pre></div></div>

<p>Furthermore there could be defstructs for comments, posts, subreddits,
etc. so that the “thing” ID stuff is hidden away. This is basically
what was already done for sessions out of necessity. I might add these
structs and functions someday but I don’t currently have a need for
it.</p>

<p>It would be neat to use this API to create an interface to reddit from
within Emacs. I imagine it might look like one of the Emacs mail
clients, or <a href="/blog/2013/09/04/">like Elfeed</a>. Almost everything, including
viewing image posts within Emacs, should be possible.</p>

<h3 id="background">Background</h3>

<p>For the last 3.5 years I’ve been a moderator of <a href="http://old.reddit.com/r/civ">/r/civ</a>,
<a href="http://old.reddit.com/r/civ/comments/clxj4/lets_tidy_rciv_up_a_bit/">starting back when it had about 100 subscribers</a>. As of this
writing it’s just short of 60k subscribers and we’re now up to 9
moderators.</p>

<p>A few months ago we decided to institute a self-post-only Sunday. All
day Sunday, midnight to midnight Eastern time, only self-posts are
allowed in the subreddit. One of the other moderators was turning this
on and off manually, so I offered to write a bot to do the job. There
<a href="https://github.com/reddit/reddit/wiki/API-Wrappers">weren’t any Lisp wrappers yet</a> (though raw4j could be used
with Clojure), so I decided to write one.</p>

<p>As mentioned before, the reddit API leaves <em>a lot</em> to be desired. It
randomly returns errors, so a correct program needs to be prepared to
retry requests after a short delay, depending on the error. My
particular annoyance is that the <code class="language-plaintext highlighter-rouge">/api/site_admin</code> endpoint requires
that most of its keys are supplied, and it’s not documented which ones
are required. Even worse, there’s no single endpoint to get all of the
required values, the key names between endpoints are inconsistent, and
even the values themselves can’t be returned as-is, requiring
<a href="http://old.reddit.com/r/bugs/comments/1t162o/">massaging/fixing before returning them back to the API</a>.</p>

<p>I hope other people find this library useful!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>My Grading Process</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/10/13/"/>
    <id>urn:uuid:8c5aecd0-c7f0-314d-d4a8-88014a195e08</id>
    <updated>2013-10-13T02:56:31Z</updated>
    <category term="java"/><category term="rant"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<p>My GitHub activity, including this blog, has really slowed down for
the past month because I’ve spent a lot of free time grading homework
for a <a href="http://apps.ep.jhu.edu/courses/605/707">design patterns class</a>, taught by a colleague at the
<a href="http://engineering.jhu.edu/">Whiting School of Engineering</a>. Conveniently for me, all of my
interaction with the students is through e-mail. It’s been a great
exercise of <a href="/blog/2013/09/03/">my new e-mail setup</a>, which itself has definitely
made this job easier. It’s kept me very organized through the whole
process.</p>

<p><img src="/img/screenshot/github-dropoff.png" alt="" /></p>

<p>Each assignment involves applying two or three design patterns to a
crude (in my opinion) XML parsing library. Students are given a
tarball containing the source code for the library, in both Java and
C++. They pick a language, modify the code to use the specified
patterns, zip/archive up the result, and e-mail me their
zipfile/tarball.</p>

<p>It took me the first couple of weeks to work out an efficient grading
workflow, and, at this point, I can accurately work my way through
most new homework submissions rapidly. On my end I already know the
original code base. All I really care about is the student’s changes.
In software development this sort of thing is expressed a <em>diff</em>,
preferably in the <a href="http://en.wikipedia.org/wiki/Diff#Unified_format"><em>unified diff</em></a> format. This is called a
<em>patch</em>. It describes precisely what was added and removed, and
provides a bit of context around each change. The context greatly
increases the readability of the patch and, as a bonus, allows it to
be applied to a slightly different source. Here’s a part of a patch
recently submitted to Elfeed:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/tests/elfeed-tests.el b/tests/elfeed-tests.el
index 31d5ad2..fbb78dd 100644
</span><span class="gd">--- a/tests/elfeed-tests.el
</span><span class="gi">+++ b/tests/elfeed-tests.el
</span><span class="p">@@ -144,15 +144,15 @@</span>
   (with-temp-buffer
     (insert elfeed-test-rss)
     (goto-char (point-min))
<span class="gd">-    (should (eq (elfeed-feed-type (xml-parse-region)) :rss)))
</span><span class="gi">+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss)))
</span>   (with-temp-buffer
     (insert elfeed-test-atom)
     (goto-char (point-min))
<span class="gd">-    (should (eq (elfeed-feed-type (xml-parse-region)) :atom)))
</span><span class="gi">+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :atom)))
</span>   (with-temp-buffer
     (insert elfeed-test-rss1.0)
     (goto-char (point-min))
<span class="gd">-    (should (eq (elfeed-feed-type (xml-parse-region)) :rss1.0))))
</span><span class="gi">+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss1.0))))
</span>
 (ert-deftest elfeed-entries-from-x ()
   (with-elfeed-test
</code></pre></div></div>

<p>I’d <em>really</em> prefer to receive patches like this as homework
submissions but this is probably too sophisticated for most students.
Instead, the first thing I do is create a patch for them from their
submission. Most students work off of their previous submission, so I
just run <code class="language-plaintext highlighter-rouge">diff</code> between their last submission and the current one.
While I’ve got a lot of the rest of the process automated with
scripts, I unfortunately cannot script patch generation. Each
student’s submission follows a unique format for that particular
student and some students are not even consistent between their own
assignments. About half the students also include generated files
alongside the source so I need to clean this up too. Generating the
patch is by far the messiest part of the whole process.</p>

<p>I grade almost entirely from the patch. 100% correct submissions are
usually only a few hundred lines of patch and I can spot all of the
required parts within a few minutes. Very easy. It’s the incorrect
submissions that consume most of my time. I have to figure out what
they’re doing, determine what they <em>meant</em> to do, and distill that
down into discrete discussion items along with point losses. In either
case I’ll also add some of my own opinions on their choice of style,
though this has no effect on the final grade.</p>

<p>For each student’s submission, I commit to a private Git repository
the raw, submitted archive file, the generated patch, and a grade
report written in Markdown. After the due date and once all the
submitted assignments are graded, I reply to each student with their
grade report. On a few occasions there’s been a back and forth
clarification dialog that has resulted in the student getting a higher
score. (That’s a hint to any students who happen to read this!)</p>

<p>Even ignoring the time it takes to generate a patch, there are still
disadvantages to not having students submit patches. One is the size:
about 60% of my current e-mail storage, which goes all the way back to
2006, is from this class alone from the past one month. It’s been a
lot of bulky attachments. I’ll delete all of the attachments once the
semester is over.</p>

<p>Another is that the students are unaware of the amount of changes they
make. Some of these patches contain a significant number of trivial
changes — breaking long lines in the original source, changing
whitespace within lines, etc. If students focused on crafting a tidy
patch they might try to avoid including these types of changes in
their submissions. I like to imagine this process being similar to
submitting a patch to an open source project. Patches should describe
a concise set of changes, and messy patches are rejected outright. The
Git staging area is all about crafting clean patches like this.</p>

<p>If there was something else I could change it would be to severely
clean up the original code base. When compiler warnings are turned on,
compiling it emits a giant list of warnings. The students are already
starting at an unnecessary disadvantage, missing out on a very
valuable feature: because of all the existing noise they can’t
effectively use compiler warnings themselves. Any new warnings would
be lost in the noise. This has also lead to many of those
trivial/unrelated changes: some students are spending time fixing the
warnings.</p>

<p>I want to go a lot further than warnings, though. I’d make sure the
original code base had absolutely no issues listed by <a href="http://pmd.sourceforge.net/">PMD</a>,
<a href="http://findbugs.sourceforge.net/">FindBugs</a>, or <a href="http://checkstyle.sourceforge.net/">Checkstyle</a> (for the Java
version, that is). Then I could use all of these static analysis tools
on student’s submissions to quickly spot issues. It’s as simple as
<a href="https://github.com/skeeto/sample-java-project/blob/master/build.xml">using my starter build configuration</a>. In fact, I’ve used
these tools a number of times in the past to perform detailed code
reviews for free (<a href="http://old.reddit.com/r/javahelp/comments/1inzs7/_/cb6ojr2">1</a>, <a href="http://old.reddit.com/r/reviewmycode/comments/1a2fty/_/c8tpme2">2</a>, <a href="http://old.reddit.com/r/javahelp/comments/1balsp/_/c958num">3</a>). Providing an
extensive code analysis for each student for each assignment would
become a realistic goal.</p>

<p>I’ve expressed all these ideas to the class’s instructor, my
colleague, so maybe some things will change in future semesters. If
I’m offered the opportunity again — assuming I didn’t screw this
semester up already — I’m still unsure if I would want to grade a
class again. It’s a lot of work for, optimistically, what amounts to
the same pay rate I received as an engineering intern in college. This
first experience at grading has been very educational, making me
appreciate those who graded my own sloppy assignments in college, and
that’s provided value beyond the monetary compensation. Next time
around wouldn’t be as educational, so my time could probably be better
spent on other activities, even if it’s writing open source software
for free.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Web Distributed Computing Revisited</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/26/"/>
    <id>urn:uuid:ab83f362-cc7f-308f-309b-5f3af5ae9be9</id>
    <updated>2013-01-26T00:00:00Z</updated>
    <category term="javascript"/><category term="web"/><category term="lisp"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<p>Four years ago I investigated the idea of using
<a href="/blog/2009/06/09/">browsers as nodes for distributed computing</a>. I concluded
that due to the platform’s constraints there were few problems that it
was suited to solve. However, the situation has since changed quite a
bit! In fact, this weekend I made practical use of web browsers across
a number of geographically separated computers to solve a
computational problem.</p>

<h3 id="what-changed">What changed?</h3>

<p><a href="http://en.wikipedia.org/wiki/Web_worker">Web workers</a> came into existence, not just as a specification
but as an implementation across all the major browsers. It allows for
JavaScript to be run in an isolated, dedicated background thread. This
eliminates the <code class="language-plaintext highlighter-rouge">setTimeout()</code> requirement from before, which not only
caused a performance penalty but really hampered running any sort of
lively interface alongside the computation. The interface and
computation were competing for time on the same thread.</p>

<p>The worker isn’t <em>entirely</em> isolated; otherwise it would be useless
for anything but wasting resources. As pubsub events, it can pass
<a href="https://developer.mozilla.org/en-US/docs/DOM/The_structured_clone_algorithm">structured clones</a> to and from the main thread running in the
page. Other than this, it has no access to the DOM or other data on
the page.</p>

<p>The interface is a bit unfriendly to <a href="/blog/2012/10/31/">live development</a>, but
it’s manageable. It’s invoked by passing the URL of a script to the
constructor. This script is the code that runs in the dedicated thread.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">worker</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Worker</span><span class="p">(</span><span class="dl">'</span><span class="s1">script/worker.js</span><span class="dl">'</span><span class="p">);</span>
</code></pre></div></div>

<p>The sort of interface that would have been more convenient for live
interaction would be something like what is found on most
multi-threaded platforms: a thread constructor that accepts a function
as an argument.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* This doesn't work! */</span>
<span class="kd">var</span> <span class="nx">worker</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Worker</span><span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// ...</span>
<span class="p">});</span>
</code></pre></div></div>

<p>I completely understand why this isn’t the case. The worker thread
needs to be totally isolated and the above example is insufficient.
I’m passing a closure to the constructor, which means I would be
sharing bindings, and therefore data, with the worker thread. This
interface could be faked using a <a href="http://en.wikipedia.org/wiki/Data_URI_scheme">data URI</a> and taking
advantage of the fact that most browsers return function source code
from <code class="language-plaintext highlighter-rouge">toString()</code>.</p>

<s>Another difficulty is libraries. Ignoring the stupid idea of
passing code through the event API and evaling it, that single URL
must contain *all* the source code the worker will use as one
script. This means if you want to use any libraries you'll need to
concatenate them with your script. That complicates things slightly,
but I imagine many people will be minifying their worker JavaScript
anyway.</s>

<p>Libraries can be loaded by the worker with the <code class="language-plaintext highlighter-rouge">importScripts()</code>
function, so not everything needs to be packed into one
script. Furthermore, workers can make HTTP requests with
XMLHttpRequest, so that data don’t need to be embedded either. Note
that it’s probably worth making these requests synchronously (third
argument <code class="language-plaintext highlighter-rouge">false</code>), because blocking isn’t an issue in workers.</p>

<p>The other big change was the effect Google Chrome, especially its V8
JavaScript engine, had on the browser market. Browser JavaScript is
probably about two orders of magnitude faster than it was when I wrote
my previous post. It’s
<a href="http://youtu.be/UJPdhx5zTaw">incredible what the V8 team has accomplished</a>. If written
carefully, V8 JavaScript performance can beat out most other languages.</p>

<p>Finally, I also now have much, much better knowledge of JavaScript
than I did four years ago. I’m not fumbling around like I was before.</p>

<h3 id="applying-these-changes">Applying these Changes</h3>

<p><a href="http://redd.it/178vsz">This weekend’s Daily Programmer challenge</a> was to find a “key” —
a permutation of the alphabet — that when applied to a small
dictionary results in the maximum number of words with their letters
in alphabetical order. That’s a keyspace of 26!, or
403,291,461,126,605,635,584,000,000.</p>

<p>When I’m developing, I use both a laptop and a desktop simultaneously,
and I really wanted to put them both to work searching that huge space
for good solutions. Initially I was going to accomplish this by
writing my program in Clojure and running it on each machine. But what
about involving my wife’s computer, too? I wasn’t going to bother her
with setting up an environment to run my stuff. Writing it in
JavaScript as a web application would be the way to go. To coordinate
this work I’d use <a href="/blog/2012/08/20/">simple-httpd</a>. And so it was born,</p>

<ul>
  <li><a href="https://github.com/skeeto/key-collab">https://github.com/skeeto/key-collab</a></li>
</ul>

<p>Here’s what it looks like in action. Each tab open consumes one CPU
core, allowing users to control their commitment by choosing how many
tabs to keep open. All of those numbers update about twice per second,
so users can get a concrete idea of what’s going on. I think it’s fun
to watch.</p>

<p><a href="/img/screenshot/key-collab.png"><img src="/img/screenshot/key-collab-thumb.png" alt="" /></a></p>

<p>(I’m obviously a fan of blues and greens on my web pages. I don’t know why.)</p>

<p>I posted the server’s URL on reddit in the challenge thread, so
various reddit users from around the world joined in on the
computation.</p>

<h3 id="strict-mode">Strict Mode</h3>

<p>I had an accidental discovery with <a href="https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Functions_and_function_scope/Strict_mode">strict mode</a> and
Chrome. I’ve always figured using strict mode had an effect on the
performance of code, but had no idea how much. From the beginning, I
had intended to use it in my worker script. Being isolated already,
there are absolutely no downsides.</p>

<p>However, while I was developing and experimenting I accidentally
turned it off and left it off. It was left turned off for a short time
in the version I distributed to the clients, so I got to see how
things were going without it. When I noticed the mistake and
uncommented the <code class="language-plaintext highlighter-rouge">"use strict"</code> line, <strong>I saw a 6-fold speed boost in
Chrome</strong>. Wow! Just making those few promises to Chrome allowed it to
make some massive performance optimizations.</p>

<p>With Chrome moving at full speed, it was able to inspect 560 keys per
second on <a href="http://www.50ply.com/">Brian’s</a> laptop. I was getting about 300 keys per
second on my own (less-capable) computers. I haven’t been able to get
anything close to these speeds in any other language/platform (but I
didn’t try in C yet).</p>

<p>Furthermore, I got a noticeable speed boost in Chrome by using proper
object oriented programming, versus a loose collection of functions
and ad-hoc structures. I think it’s because it made me construct my
data structures consistently, allowing V8’s hidden classes to work
their magic. It also probably helped the compiler predict type
information. I’ll need to investigate this further.</p>

<p>Use strict mode whenever possible, folks!</p>

<h3 id="what-made-this-problem-work">What made this problem work?</h3>

<p>Having web workers available was a big help. However, this problem met
the original constraints fairly well.</p>

<ul>
  <li>
    <p>It was <strong>low bandwidth</strong>. No special per-client instructions were
required. The client only needed to report back a 26-character
string.</p>
  </li>
  <li>
    <p>There was <strong>no state</strong> to worry about. The original version of my
script tried keys at random. The later version used a hill-climbing
algorithm, so there was <em>some</em> state but it was only needed for a
few seconds at a time. It wasn’t worth holding onto.</p>
  </li>
</ul>

<p>This project was a lot of fun so I hope I get another opportunity to
do it again in the future, hopefully with a lot more nodes
participating.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Moving to Openbox</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/06/25/"/>
    <id>urn:uuid:8e15a68e-5ad4-356b-7d5b-5c854e4c5302</id>
    <updated>2012-06-25T00:00:00Z</updated>
    <category term="rant"/><category term="git"/><category term="debian"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<p>With <a href="/blog/2012/06/23/">my dotfiles repository established</a> I now
have a common configuration and environment for Bash, Git, Emacs
(separate repository), and even Firefox! This wouldn’t normally be
possible because Firefox doesn’t have tidy dotfiles by default, but
the wonderful <a href="/blog/2009/04/03/">Pentadactyl</a> made it possible. My
script sets up keybindings, bookmark keywords, and quickmarks so that
my browser feels identical across all my computers. Now that it’s easy
to add tweaks, I’m sure I’ll be putting more in there in the future.</p>

<p>However, one major application remained and I was really itching to
capture its configuration too, since even my web browser is part of
the experience. I could drop my dotfiles into a new computer within
minutes and be ready to start hacking, except for my desktop
environment. This was still a tedious, manual step, plagued by the
configuration propagation issue. I wouldn’t to get too fancy with
keybindings since I couldn’t rely on them being everywhere.</p>

<p>The problem was I was using KDE at the time and KDE’s configuration
isn’t really version-friendly. Some of it is binary, making it
unmergable, it doesn’t play well between different versions, and it’s
unclear what needs to be captured and what can be ignored.</p>

<p>I wasn’t exactly a <em>happy</em> KDE user and really felt no attachment to
it. I had only been using it a few months. I’ve used a number of
desktops since 2004, the main ones being Xfce (couple years), IceWM
(couple years), xmonad (8 months), and Gnome 2 (the rest of the
time). Gnome 2 was my fallback, the familiar environment where I could
feel at home and secure — that is, until Gnome 3 / Unity. The coming
of Gnome 3 marked the death of Gnome 2. It became harder and harder to
obtain version 2 and I lost my fallback.</p>

<p>I gave Gnome 3 and Unity each a couple of weeks but I just couldn’t
stand them. Unremovable mouse hotspots, all new alt-tab behavior,
regular crashing (after restoring old alt-tab behavior), and extreme
unconfigurability even with a third-party tweak tool. I jumped for KDE
4, hoping to establish a comfortable fallback for myself.</p>

<p>KDE is pretty and configurable enough for me to get work done. There’s
a lot of bloat (“activities” and widgets), but I can safely ignore
it. The areas where it’s lacking didn’t bother me much, like the
inability/non-triviality of custom application launchers.</p>

<p>My short time with Gnome 3 and now with KDE 4 did herald a new, good
change to my habits: keyboard application launching. I got used to
using the application menu to type my application name and launch
it. I <em>did</em> use dmenu during my xmonad trial, but I didn’t quite make
a habit out of it. It was also on a slower computer, slow enough for
dmenu to be a problem. For years I was just launching things from a
terminal. However, the Gnome and KDE menus both have a big common
annoyance. If you want to add a custom item, you need to write a
special desktop file and save it to the right location. Bleh! dmenu
works right off your <code class="language-plaintext highlighter-rouge">PATH</code> — the way it <em>should</em> work — so no
special work needed.</p>

<p>Gnome 2 <em>has</em> been revived with a fork called MATE, but with the lack
of a modern application launcher, I’m now too spoiled to be
interested. Plus I wanted to find a suitable environment that I could
integrate with my dotfiles repository.</p>

<p>After being a little embarrassed at
<a href="http://www.terminally-incoherent.com/blog/2012/05/18/show-me-your-desktop-4/">Luke’s latest <em>Show Me Your Desktop</em></a>
(what kind of self-respecting Linux geek uses a heavyweight desktop?!)
I shopped around for a clean desktop environment with a configuration
that would version properly. Perhaps I might find that perfect desktop
environment I’ve been looking for all these years, if it even
exists. It wasn’t too long before I ended up in Openbox. I’m pleased
to report that I’m exceptionally happy with it.</p>

<p>Its configuration is two XML files and a shell script. The XML can be
generated by a GUI configuration editor and/or edited by hand. The GUI
was nice for quickly seeing what Openbox could do when I first logged
into it, so I <em>did</em> use it once and find it useful. The configuration
is very flexible too! I created keyboard bindings to slosh windows
around the screen, resize them, move them across desktops, maximize in
only one direction, change focus in a direction, and launch specific
applications (for example super-n launches a new terminal
window). It’s like the perfect combination of tiling and stacking
window managers. Not only is it more configurable than KDE, but it’s
done cleanly.</p>

<p>Openbox is pretty close to the perfect environment I want. There are
still some annoying little bugs, mostly related to window positioning,
but they’ve mostly been fixed. The problem is that they haven’t made
an official release for a year and a half, so these fixes aren’t yet
available. I might normally think to myself, “Why haven’t I been using
Openbox for years?” but I know better than that. Versions of Openbox
from just two years ago, like the one in Debian Squeeze (the current
stable), <em>aren’t very good</em>. So I haven’t actually been missing out on
anything. This is something really new.</p>

<p>I’m not using a desktop environment on top of Openbox, so there are no
panels or any of the normal stuff. This is perfectly fine for me; I
have better things to spend that real estate on. I <em>am</em> using a window
composite manager called <code class="language-plaintext highlighter-rouge">xcompmgr</code> to make things pretty through
proper transparency and subtle drop shadows. Without panels, there
were a couple problems to deal with. I was used to my desktop
environment performing removable drive mounting and wireless network
management for me, so I needed to find standalone applications to do
the job.</p>

<p>Removable filesystems can be mounted the old fashioned way, where I
create a mount point, find the device name, then mount the device on
the mount point as root. This is annoying and unacceptable after
experiencing automounting for years. I found two applications to do
this: Thunar, Xfce’s file manager; and <code class="language-plaintext highlighter-rouge">pmount</code>, a somewhat-buggy
command-line tool.</p>

<p>I chose Wicd to do network management. It has both a GTK client and an
ncurses client, so I can easily manage my wireless network
connectivity with and without a graphical environment — something I
could have used for years now (goodbye <code class="language-plaintext highlighter-rouge">iwconfig</code>)! Unfortunately Wicd
is rigidly inflexible, allowing only one network interface to be up at
a time. This is a problem when I want to be on both a wired and
wireless network at the same time. For example, sometimes I use my
laptop as a gateway between a wired and wireless network. In these
cases I need to shut down Wicd and go back to manual networking for
awhile.</p>

<p>The next issue was wallpapers. I’ve always liked having
<a href="http://reddit.com/r/EarthPorn">natural landscape wallpapers</a>. So far,
I could move onto a new computer and have everything functionally
working, but I’d have a blank gray background. KDE 4 got me used to
slideshow wallpaper, changing the landscape image to a new one every
10-ish minutes. For a few years now, I’ve made a habit of creating a
<code class="language-plaintext highlighter-rouge">.wallpapers</code> directory in my home directory and dumping interesting
wallpapers in there as I come across them. When picking a new
wallpaper, or telling KDE where to look for random wallpapers, I’d
grab one from there. I’ve decided to continue this with my dotfiles
repository.</p>

<p>I wrote a shell script that uses <code class="language-plaintext highlighter-rouge">feh</code> to randomly set the root
(wallpaper) image every 10 minutes. It gets installed in <code class="language-plaintext highlighter-rouge">.wallpapers</code>
from the dotfiles repository. Openbox runs this script in the
background when it starts. I don’t actually store the hundreds of
images in my repository. There’s a <code class="language-plaintext highlighter-rouge">fetch.sh</code> that grabs them all from
Amazon S3 automatically. This is just another small step I take after
running the dotfiles install script. Any new images I throw in
<code class="language-plaintext highlighter-rouge">.wallpaper</code> get put int the rotation, but only for that computer.</p>

<p>I’ve now got all this encoded into my configuration files and checked
into my dotfiles repository. It’s <em>incredibly</em> satisfying to have this
in common across each of my computers and to have it instantly
available on any new installs. I’m that much closer to having <em>the</em>
ideal (and ultimately unattainable) computing experience!</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Making Your Own GIF Image Macros</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/04/10/"/>
    <id>urn:uuid:dc4ca81c-6c35-33f6-58c5-a77a645f3fbf</id>
    <updated>2012-04-10T00:00:00Z</updated>
    <category term="media"/><category term="video"/><category term="tutorial"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<p>This tutorial is very similar to my <a href="/blog/2011/11/28/">video editing tutorial</a>.
That’s because the process is the same up until the encoding stage,
where I encode to GIF rather than WebM.</p>

<p>So you want to make your own animated GIFs from a video clip? Well,
it’s a pretty easy process that can be done almost entirely from the
command line. I’m going to show you how to turn the clip into a GIF
and add an image macro overlay. Like this,</p>

<p><img src="https://s3.amazonaws.com/nullprogram/calvin/calvin-macro.gif" alt="" /></p>

<p>The key tool here is going to be Gifsicle, a very excellent
command-line tool for creating and manipulating GIF images. So, the
full list of tools is,</p>

<ul>
  <li><a href="http://www.mplayerhq.hu/">MPlayer</a></li>
  <li><a href="http://www.imagemagick.org/">ImageMagick</a></li>
  <li><a href="http://www.gimp.org/">GIMP</a></li>
  <li><a href="http://www.lcdf.org/gifsicle/">Gifsicle</a></li>
</ul>

<p>Here’s the source video for the tutorial. It’s an awkward video my
wife took of our confused cats, Calvin and Rocc.</p>

<video src="https://s3.amazonaws.com/nullprogram/calvin/calvin-dummy.webm" width="480" height="360" controls="controls">
</video>

<p>My goal is to cut after Calvin looks at the camera, before he looks
away. From roughly 3 seconds to 23 seconds. I’ll have mplayer give me
the frames as JPEG images.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mplayer -vo jpeg -ss 3 -endpos 23 -benchmark calvin-dummy.webm
</code></pre></div></div>

<p>This tells mplayer to output JPEG frames between 3 and 23 seconds,
doing it as fast as it can (<code class="language-plaintext highlighter-rouge">-benchmark</code>). This output almost 800
images. Next I look through the frames and delete the extra images at
the beginning and end that I don’t want to keep. I’m also going to
throw away the even numbered frames, since GIFs can’t have such a high
framerate in practice.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm *[0,2,4,6,8].jpg
</code></pre></div></div>

<p>There’s also dead space around the cats in the image that I want to
crop. Looking at one of the frames in GIMP, I’ve determined this is a
450 by 340 box, with the top-left corner at (136, 70). We’ll need
this information for ImageMagick.</p>

<p>Gifsicle only knows how to work with GIFs, so we need to batch convert
these frames with ImageMagick’s <code class="language-plaintext highlighter-rouge">convert</code>. This is where we need the
crop dimensions from above, which is given in ImageMagick’s notation.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ls *.jpg | xargs -I{} -P4 \
    convert {} -crop 450x340+136+70 +repage -resize 300 {}.gif
</code></pre></div></div>

<p>This will do four images at a time in parallel. The <code class="language-plaintext highlighter-rouge">+repage</code> is
necessary because ImageMagick keeps track of the original image
“canvas”, and it will simply drop the section of the image we don’t
want rather than completely crop it away. The repage forces it to
resize the canvas as well. I’m also scaling it down slightly to save
on the final file size.</p>

<p>We have our GIF frames, so we’re almost there! Next, we ask Gifsicle
to compile an animated GIF.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gifsicle --loop --delay 5 --dither --colors 32 -O2 *.gif &gt; ../out.gif
</code></pre></div></div>

<p>I’ve found that using 32 colors and dithering the image gives very
nice results at a reasonable file size. Dithering adds noise to the
image to remove the banding that occurs with small color palettes.
I’ve also instructed it to optimize the GIF as fully as it can
(<code class="language-plaintext highlighter-rouge">-O2</code>). If you’re just experimenting and want Gifsicle to go faster,
turning off dithering goes a long way, followed by disabling
optimization.</p>

<p>The delay of 5 gives us the 15-ish frames-per-second we want — since
we cut half the frames from a 30 frames-per-second source video. We
also want to loop indefinitely.</p>

<p><img src="https://s3.amazonaws.com/nullprogram/calvin/calvin-dummy.gif" alt="" /></p>

<p>The result is this 6.7 MB GIF. A little large, but good enough. It’s
basically what I was going for. Next we add some macro text.</p>

<p>In GIMP, make a new image with the same dimensions of the GIF frames,
with a transparent background.</p>

<p><img src="/img/gif-tutorial/blank.png" alt="" /></p>

<p>Add your macro text in white, in the Impact Condensed font.</p>

<p><img src="/img/gif-tutorial/text1.png" alt="" /></p>

<p>Right click the text layer and select “Alpha to Selection,” then under
Select, grow the selection by a few pixels — 3 in this case.</p>

<p><img src="/img/gif-tutorial/text2.png" alt="" /></p>

<p>Select the background layer and fill the selection with black, giving
a black border to the text.</p>

<p><img src="/img/gif-tutorial/text3.png" alt="" /></p>

<p>Save this image as text.png, for our text overlay.</p>

<p><img src="/img/gif-tutorial/text.png" alt="" /></p>

<p>Time to go back and redo the frames, overlaying the text this
time. This is called compositing and ImageMagick can do it without
breaking a sweat. To composite two images is simple.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>convert base.png top.png -composite out.png
</code></pre></div></div>

<p>List the image to go on top, then use the <code class="language-plaintext highlighter-rouge">-composite</code> flag, and it’s
placed over top of the base image. In my case, I actually don’t want
the text to appear until Calvin, the orange cat, faces the camera.
This happens quite conveniently at just about frame 500, so I’m only
going to redo those frames.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ls 000005*.jpg | xargs -I{} -P4 \
    convert {} -crop 450x340+136+70 +repage \
               -resize 300 text.png -composite {}.gif
</code></pre></div></div>

<p>Run Gifsicle again and this 6.2 MB image is the result. The text
overlay compresses better, so it’s a tiny bit smaller.</p>

<p><img src="https://s3.amazonaws.com/nullprogram/calvin/calvin-macro.gif" alt="" /></p>

<p>Now it’s time to <a href="http://old.reddit.com/r/funny/comments/s481d/">post it on reddit</a> and
<a href="http://old.reddit.com/r/lolcats/comments/s47qa/">reap that tasty, tasty karma</a>.
(<a href="http://imgur.com/2WhBf">Over 400,000 views!</a>)</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Rumor Simulation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/03/09/"/>
    <id>urn:uuid:9fee2022-d273-34d6-0970-546b5e875460</id>
    <updated>2012-03-09T00:00:00Z</updated>
    <category term="java"/><category term="math"/><category term="media"/><category term="video"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<p>A couple months ago someone posted
<a href="http://old.reddit.com/r/javahelp/comments/ngvp4/">an interesting programming homework problem</a> on reddit,
asking for help. Help had already been provided before I got there,
but I thought the problem was an interesting one.</p>

<blockquote>
  <p>Write a program that simulates the spreading of a rumor among a group
of people. At any given time, each person in the group is in one of
three categories:</p>

  <ul>
    <li>IGNORANT - the person has not yet heard the rumor</li>
    <li>SPREADER - the person has heard the rumor and is eager to spread it</li>
    <li>STIFLER - the person has heard the rumor but considers it old news
and will not spread it</li>
  </ul>

  <p>At the very beginning, there is one spreader; everyone else is
ignorant. Then people begin to encounter each other.</p>

  <p>So the encounters go like this:</p>

  <ul>
    <li>If a SPREADER and an IGNORANT meet, IGNORANT becomes a SPREADER.</li>
    <li>If a SPREADER and a STIFLER meet, the SPREADER becomes a STIFLER.</li>
    <li>If a SPREADER and a SPREADER meet, they both become STIFLERS.</li>
    <li>In all other encounters nothing changes.</li>
  </ul>

  <p>Your program should simulate this by repeatedly selecting two people
randomly and having them “meet.”</p>

  <p>There are three questions we want to answer:</p>

  <ul>
    <li>Will everyone eventually hear the rumor, or will it die out before
everyone hears it?</li>
    <li>If it does die out, what percentage of the population hears it?</li>
    <li>How long does it take? i.e. How many encounters occur before the
rumor dies out?</li>
  </ul>
</blockquote>

<p>I wrote a very thorough version to <a href="/blog/2011/11/28/">produce videos</a> of the
simulation in action.</p>

<ul>
  <li><a href="https://github.com/skeeto/rumor-sim">https://github.com/skeeto/rumor-sim</a></li>
</ul>

<p>It accepts some command line arguments, so you don’t need to edit any
code just to try out some simple things.</p>

<p>And here are a couple of videos. Each individual is a cell in a 2D
grid. IGNORANT is black, SPREADER is red, and STIFLER is white. Note
that this is <em>not</em> a cellular automata, because cell neighborship does
not come into play.</p>

<video src="https://s3.amazonaws.com/nullprogram/rumor/rumor-small.webm" controls="controls" width="400" height="250">
</video>

<video src="https://s3.amazonaws.com/nullprogram/rumor/rumor.webm" controls="controls" width="400" height="400">
</video>

<p>Here’s are the statistics for ten different rumors.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Rumor(n=10000, meetups=132380, knowing=0.789)
Rumor(n=10000, meetups=123944, knowing=0.7911)
Rumor(n=10000, meetups=117459, knowing=0.7985)
Rumor(n=10000, meetups=127063, knowing=0.79)
Rumor(n=10000, meetups=124116, knowing=0.8025)
Rumor(n=10000, meetups=115903, knowing=0.7952)
Rumor(n=10000, meetups=137222, knowing=0.7927)
Rumor(n=10000, meetups=134354, knowing=0.797)
Rumor(n=10000, meetups=113887, knowing=0.8025)
Rumor(n=10000, meetups=139534, knowing=0.7938)
</code></pre></div></div>

<p>Except for very small populations, the simulation always terminates
very close to 80% rumor coverage. I don’t understand (yet) why this
is, but I find it very interesting.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Common Lisp Quick Reference</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/02/06/"/>
    <id>urn:uuid:d74ba1a7-25bb-37f4-7b2e-46d2d56bfe36</id>
    <updated>2010-02-06T00:00:00Z</updated>
    <category term="lisp"/><category term="link"/><category term="reddit"/>
    <content type="html">
      <![CDATA[<!-- 6 February 2010 -->
<p>
I found this <a href="http://clqr.berlios.de/"> Common Lisp Quick
Reference</a> the other day
from <a href="http://old.reddit.com/r/lisp/"> r/lisp</a>, and I think
it's <i>fantastic</i>. It's a comprehensive, libre booklet of the
symbols defined by the Common Lisp ANSI standard. Very slick!
</p>
<p>
The main version is meant to be printed out and nested with a vertical
fold, and it works quite well. If I ever get a chance to use Common
Lisp at work (a man can dream), probably at a location without
Internet access, this could come in handy. So I printed out one for
myself,
</p>
<p class="center">
<a href="/img/clqr/front.jpg"><img src="/img/clqr/front-thumb.jpg" alt=""/></a>
<a href="/img/clqr/open.jpg"><img src="/img/clqr/open-thumb.jpg" alt=""/></a>
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
