fortune
file for the all-time top 10,000
/r/Showerthoughts posts, as of October 2016. As a word of
warning: Many of these entries are adult humor and may not be
appropriate for your work computer. These fortunes would be
categorized as “offensive” (fortune -o
).
Download: showerthoughts (1.3 MB)
The copyright status of this file is subject to each of its thousands of authors. Since it’s not possible to contact many of these authors — some may not even still live — it’s obviously never going to be under an open source license (Creative Commons, etc.). Even more, some quotes are probably from comedians and such, rather than by the redditor who made the post. I distribute it only for fun.
To install this into your fortune
database, first process it with
strfile
to create a random-access index, showerthoughts.dat, then
copy them to the directory with the rest.
$ strfile showerthoughts
"showerthoughts.dat" created
There were 10000 strings
Longest string: 343 bytes
Shortest string: 39 bytes
$ cp showerthoughts* /usr/share/games/fortunes/
Alternatively, fortune
can be told to use this file directly:
$ fortune showerthoughts
Not once in my life have I stepped into somebody's house and
thought, "I sure hope I get an apology for 'the mess'."
―AndItsDeepToo, Aug 2016
If you didn’t already know, fortune
is an old unix utility that
displays a random quotation from a quotation database — a digital
fortune cookie. I use it as an interactive login shell greeting on
my ODROID-C2 server:
if shopt -q login_shell; then
fortune ~/.fortunes
fi
Fortunately I didn’t have to do something crazy like scrape reddit for weeks on end. Instead, I downloaded the pushshift.io submission archives, which is currently around 70 GB compressed. Each file contains one month’s worth of JSON data, one object per submission, one submission per line, all compressed with bzip2.
Unlike so many other datasets, especially when it’s made up of arbitrary inputs from millions of people, the format of the /r/Showerthoughts posts is surprisingly very clean and requires virtually no touching up. It’s some really fantastic data.
A nice feature of bzip2 is concatenating compressed files also concatenates the uncompressed files. Additionally, it’s easy to parallelize bzip2 compression and decompression, which gives it an edge over xz. I strongly recommend using lbzip2 to decompress this data, should you want to process it yourself.
cat RS_*.bz2 | lbunzip2 > everything.json
jq is my favorite command line tool for processing JSON (and
rendering fractals). To filter all the /r/Showerthoughts posts,
it’s a simple select
expression. Just mind the capitalization of the
subreddit’s name. The -c
tells jq
to keep it one per line.
cat RS_*.bz2 | \
lbunzip2 | \
jq -c 'select(.subreddit == "Showerthoughts")' \
> showerthoughts.json
However, you’ll quickly find that jq is the bottleneck, parsing all
that JSON. Your cores won’t be exploited by lbzip2 as they should. So
I throw grep
in front to dramatically decrease the workload for
jq
.
cat *.bz2 | \
lbunzip2 | \
grep -a Showerthoughts | \
jq -c 'select(.subreddit == "Showerthoughts")'
> showerthoughts.json
This will let some extra things through, but it’s a superset. The -a
option is necessary because the data contains some null bytes. Without
it, grep
switches into binary mode and breaks everything. This is
incredibly frustrating when you’ve already waited half an hour for
results.
To further reduce the workload further down the pipeline, I take
advantage of the fact that only four fields will be needed: title
,
score
, author
, and created_utc
. The rest can — and should, for
efficiency’s sake — be thrown away where it’s cheap to do so.
cat *.bz2 | \
lbunzip2 | \
grep -a Showerthoughts | \
jq -c 'select(.subreddit == "Showerthoughts") |
{title, score, author, created_utc}' \
> showerthoughts.json
This gathers all 1,199,499 submissions into a 185 MB JSON file (as of
this writing). Most of these submissions are terrible, so the next
step is narrowing it to the small set of good submissions and putting
them into the fortune
database format.
It turns out reddit already has a method for finding the best submissions: a voting system. Just pick the highest scoring posts. Through experimentation I arrived at 10,000 as the magic cut-off number. After this the quality really starts to drop off. Over time this should probably be scaled up with the total number of submissions.
I did both steps at the same time using a bit of Emacs Lisp, which is particularly well-suited to the task:
This Elisp program reads one JSON object at a time and sticks each into a AVL tree sorted by score (descending), then timestamp (ascending), then title (ascending). The AVL tree is limited to 10,000 items, with the lowest items being dropped. This was a lot faster than the more obvious approach: collecting everything into a big list, sorting it, and keeping the top 10,000 items.
The most complicated part is actually paragraph wrapping the submissions. Most are too long for a single line, and letting the terminal hard wrap them is visually unpleasing. The submissions are encoded in UTF-8, some with characters beyond simple ASCII. Proper wrapping requires not just Unicode awareness, but also some degree of Unicode rendering. The algorithm needs to recognize grapheme clusters and know the size of the rendered text. This is not so trivial! Most paragraph wrapping tools and libraries get this wrong, some counting width by bytes, others counting width by codepoints.
Emacs’ M-x fill-paragraph
knows how to do all these things — only
for a monospace font, which is all I needed — and I decided to
leverage it when generating the fortune
file. Here’s an example that
paragraph-wraps a string:
(defun string-fill-paragraph (s)
(with-temp-buffer
(insert s)
(fill-paragraph)
(buffer-string)))
For the file format, items are delimited by a %
on a line by itself.
I put the wrapped content, followed by a quotation dash, the
author, and the date. A surprising number of these submissions have
date-sensitive content (“on this day X years ago”), so I found it was
important to include a date.
April Fool's Day is the one day of the year when people critically
evaluate news articles before accepting them as true.
―kellenbrent, Apr 2015
%
Of all the bodily functions that could be contagious, thank god
it's the yawn.
―MKLV, Aug 2015
%
There’s the potential that a submission itself could end with a lone
%
and, with a bit of bad luck, it happens to wrap that onto its own
line. Fortunately this hasn’t happened yet. But, now that I’ve
advertised it, someone could make such a submission, popular enough
for the top 10,000, with the intent to personally trip me up in a
future update. I accept this, though it’s unlikely, and it would be
fairly easy to work around if it happened.
The strfile
program looks for the %
delimiters and fills out a
table of file offsets. The header of the .dat
file indicates the
number strings along with some other metadata. What follows is a table
of 32-bit file offsets.
struct {
uint32_t str_version; /* version number */
uint32_t str_numstr; /* # of strings in the file */
uint32_t str_longlen; /* length of longest string */
uint32_t str_shortlen; /* shortest string length */
uint32_t str_flags; /* bit field for flags */
char str_delim; /* delimiting character */
}
Note that the table doesn’t necessarily need to list the strings in
the same order as they appear in the original file. In fact, recent
versions of strfile
can sort the strings by sorting the table, all
without touching the original file. Though none of this important to
fortune
.
Now that you know how it all works, you can build your own fortune
file from your own inputs!
Except for logging in, the library is agnostic about the actual API endpoints themselves. It just knows how to translate between Elisp and the reddit API protocol. This makes the library dead simple to use. I had considered supporting OAuth2 authentication rather than password authentication, but reddit’s OAuth2 support is pretty rough around the edges.
The reddit API has two kinds of endpoints, GET and POST, so there are really only three functions to concern yourself with.
reddit-login
reddit-get
reddit-post
And one variable,
reddit-session
The reddit-login
function is really just a special case of
reddit-post
. It returns a session value (cookie/modhash tuple) that
is used by the other two functions for authenticating the user. Just
as you get automatically with almost all Elisp data structures —
probably more so than any other popular programming language — it
can be serialized with the printer and reader, allowing a reddit
session to be maintained across Emacs sessions.
The return value of reddit-login
generally doesn’t need to be
captured. It automatically sets the dynamic variable reddit-session
,
which is what the other functions access for authentication. This can
be bound with let
to other session values in order to switch between
different users.
Both reddit-get
and reddit-post
take an endpoint name and a list
of key-value pairs in the form of a property list (plist). (The
api-type
key is automatically supplied.) They each return the JSON
response from the server in association list (alist) form. The actual
shape of this data matches the response from reddit, which,
unfortunately, is inconsistent and unspecified, so writing any sort of
program to operate on the API requires lots of trial and error. If the
API responded with an error, these functions signal a reddit-error
.
Typical usage looks like so. Notice that values need not be only strings; they just need to print to something reasonable.
;; Login first
(reddit-login "your-username" "your-password")
;; Subscribe to a subreddit
(reddit-post "/api/subscribe" '(:sr "t5_2s49f" :action sub))
;; Post a comment
(reddit-post "/api/comment/" '(:text "Hello world." :thing_id "t1_cd3ar7y"))
For plists keys I considered automatically converting between dashes and underscores so that the keywords could have Lisp-style names. But the reddit API is inconsistent, using both, so there’s no correct way to do this.
To further refine the API it might be worth defining a function for each of the reddit endpoints, forming a facade for the wrapper library, hiding way the plist arguments and complicated responses. That would eliminate the trial and error of using the API.
(defun reddit-api-comment (parent comment)
(if (null reddit-session)
(error "Not logged in.")
;; TODO: reduce the return value into a thing/struct
(reddit-post "/api/comment/" '(:thing_id parent :text comment))))
Furthermore there could be defstructs for comments, posts, subreddits, etc. so that the “thing” ID stuff is hidden away. This is basically what was already done for sessions out of necessity. I might add these structs and functions someday but I don’t currently have a need for it.
It would be neat to use this API to create an interface to reddit from within Emacs. I imagine it might look like one of the Emacs mail clients, or like Elfeed. Almost everything, including viewing image posts within Emacs, should be possible.
For the last 3.5 years I’ve been a moderator of /r/civ, starting back when it had about 100 subscribers. As of this writing it’s just short of 60k subscribers and we’re now up to 9 moderators.
A few months ago we decided to institute a self-post-only Sunday. All day Sunday, midnight to midnight Eastern time, only self-posts are allowed in the subreddit. One of the other moderators was turning this on and off manually, so I offered to write a bot to do the job. There weren’t any Lisp wrappers yet (though raw4j could be used with Clojure), so I decided to write one.
As mentioned before, the reddit API leaves a lot to be desired. It
randomly returns errors, so a correct program needs to be prepared to
retry requests after a short delay, depending on the error. My
particular annoyance is that the /api/site_admin
endpoint requires
that most of its keys are supplied, and it’s not documented which ones
are required. Even worse, there’s no single endpoint to get all of the
required values, the key names between endpoints are inconsistent, and
even the values themselves can’t be returned as-is, requiring
massaging/fixing before returning them back to the API.
I hope other people find this library useful!
]]>Each assignment involves applying two or three design patterns to a crude (in my opinion) XML parsing library. Students are given a tarball containing the source code for the library, in both Java and C++. They pick a language, modify the code to use the specified patterns, zip/archive up the result, and e-mail me their zipfile/tarball.
It took me the first couple of weeks to work out an efficient grading workflow, and, at this point, I can accurately work my way through most new homework submissions rapidly. On my end I already know the original code base. All I really care about is the student’s changes. In software development this sort of thing is expressed a diff, preferably in the unified diff format. This is called a patch. It describes precisely what was added and removed, and provides a bit of context around each change. The context greatly increases the readability of the patch and, as a bonus, allows it to be applied to a slightly different source. Here’s a part of a patch recently submitted to Elfeed:
diff --git a/tests/elfeed-tests.el b/tests/elfeed-tests.el
index 31d5ad2..fbb78dd 100644
--- a/tests/elfeed-tests.el
+++ b/tests/elfeed-tests.el
@@ -144,15 +144,15 @@
(with-temp-buffer
(insert elfeed-test-rss)
(goto-char (point-min))
- (should (eq (elfeed-feed-type (xml-parse-region)) :rss)))
+ (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss)))
(with-temp-buffer
(insert elfeed-test-atom)
(goto-char (point-min))
- (should (eq (elfeed-feed-type (xml-parse-region)) :atom)))
+ (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :atom)))
(with-temp-buffer
(insert elfeed-test-rss1.0)
(goto-char (point-min))
- (should (eq (elfeed-feed-type (xml-parse-region)) :rss1.0))))
+ (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss1.0))))
(ert-deftest elfeed-entries-from-x ()
(with-elfeed-test
I’d really prefer to receive patches like this as homework
submissions but this is probably too sophisticated for most students.
Instead, the first thing I do is create a patch for them from their
submission. Most students work off of their previous submission, so I
just run diff
between their last submission and the current one.
While I’ve got a lot of the rest of the process automated with
scripts, I unfortunately cannot script patch generation. Each
student’s submission follows a unique format for that particular
student and some students are not even consistent between their own
assignments. About half the students also include generated files
alongside the source so I need to clean this up too. Generating the
patch is by far the messiest part of the whole process.
I grade almost entirely from the patch. 100% correct submissions are usually only a few hundred lines of patch and I can spot all of the required parts within a few minutes. Very easy. It’s the incorrect submissions that consume most of my time. I have to figure out what they’re doing, determine what they meant to do, and distill that down into discrete discussion items along with point losses. In either case I’ll also add some of my own opinions on their choice of style, though this has no effect on the final grade.
For each student’s submission, I commit to a private Git repository the raw, submitted archive file, the generated patch, and a grade report written in Markdown. After the due date and once all the submitted assignments are graded, I reply to each student with their grade report. On a few occasions there’s been a back and forth clarification dialog that has resulted in the student getting a higher score. (That’s a hint to any students who happen to read this!)
Even ignoring the time it takes to generate a patch, there are still disadvantages to not having students submit patches. One is the size: about 60% of my current e-mail storage, which goes all the way back to 2006, is from this class alone from the past one month. It’s been a lot of bulky attachments. I’ll delete all of the attachments once the semester is over.
Another is that the students are unaware of the amount of changes they make. Some of these patches contain a significant number of trivial changes — breaking long lines in the original source, changing whitespace within lines, etc. If students focused on crafting a tidy patch they might try to avoid including these types of changes in their submissions. I like to imagine this process being similar to submitting a patch to an open source project. Patches should describe a concise set of changes, and messy patches are rejected outright. The Git staging area is all about crafting clean patches like this.
If there was something else I could change it would be to severely clean up the original code base. When compiler warnings are turned on, compiling it emits a giant list of warnings. The students are already starting at an unnecessary disadvantage, missing out on a very valuable feature: because of all the existing noise they can’t effectively use compiler warnings themselves. Any new warnings would be lost in the noise. This has also lead to many of those trivial/unrelated changes: some students are spending time fixing the warnings.
I want to go a lot further than warnings, though. I’d make sure the original code base had absolutely no issues listed by PMD, FindBugs, or Checkstyle (for the Java version, that is). Then I could use all of these static analysis tools on student’s submissions to quickly spot issues. It’s as simple as using my starter build configuration. In fact, I’ve used these tools a number of times in the past to perform detailed code reviews for free (1, 2, 3). Providing an extensive code analysis for each student for each assignment would become a realistic goal.
I’ve expressed all these ideas to the class’s instructor, my colleague, so maybe some things will change in future semesters. If I’m offered the opportunity again — assuming I didn’t screw this semester up already — I’m still unsure if I would want to grade a class again. It’s a lot of work for, optimistically, what amounts to the same pay rate I received as an engineering intern in college. This first experience at grading has been very educational, making me appreciate those who graded my own sloppy assignments in college, and that’s provided value beyond the monetary compensation. Next time around wouldn’t be as educational, so my time could probably be better spent on other activities, even if it’s writing open source software for free.
]]>Web workers came into existence, not just as a specification
but as an implementation across all the major browsers. It allows for
JavaScript to be run in an isolated, dedicated background thread. This
eliminates the setTimeout()
requirement from before, which not only
caused a performance penalty but really hampered running any sort of
lively interface alongside the computation. The interface and
computation were competing for time on the same thread.
The worker isn’t entirely isolated; otherwise it would be useless for anything but wasting resources. As pubsub events, it can pass structured clones to and from the main thread running in the page. Other than this, it has no access to the DOM or other data on the page.
The interface is a bit unfriendly to live development, but it’s manageable. It’s invoked by passing the URL of a script to the constructor. This script is the code that runs in the dedicated thread.
var worker = new Worker('script/worker.js');
The sort of interface that would have been more convenient for live interaction would be something like what is found on most multi-threaded platforms: a thread constructor that accepts a function as an argument.
/* This doesn't work! */
var worker = new Worker(function() {
// ...
});
I completely understand why this isn’t the case. The worker thread
needs to be totally isolated and the above example is insufficient.
I’m passing a closure to the constructor, which means I would be
sharing bindings, and therefore data, with the worker thread. This
interface could be faked using a data URI and taking
advantage of the fact that most browsers return function source code
from toString()
.
Libraries can be loaded by the worker with the importScripts()
function, so not everything needs to be packed into one
script. Furthermore, workers can make HTTP requests with
XMLHttpRequest, so that data don’t need to be embedded either. Note
that it’s probably worth making these requests synchronously (third
argument false
), because blocking isn’t an issue in workers.
The other big change was the effect Google Chrome, especially its V8 JavaScript engine, had on the browser market. Browser JavaScript is probably about two orders of magnitude faster than it was when I wrote my previous post. It’s incredible what the V8 team has accomplished. If written carefully, V8 JavaScript performance can beat out most other languages.
Finally, I also now have much, much better knowledge of JavaScript than I did four years ago. I’m not fumbling around like I was before.
This weekend’s Daily Programmer challenge was to find a “key” — a permutation of the alphabet — that when applied to a small dictionary results in the maximum number of words with their letters in alphabetical order. That’s a keyspace of 26!, or 403,291,461,126,605,635,584,000,000.
When I’m developing, I use both a laptop and a desktop simultaneously, and I really wanted to put them both to work searching that huge space for good solutions. Initially I was going to accomplish this by writing my program in Clojure and running it on each machine. But what about involving my wife’s computer, too? I wasn’t going to bother her with setting up an environment to run my stuff. Writing it in JavaScript as a web application would be the way to go. To coordinate this work I’d use simple-httpd. And so it was born,
Here’s what it looks like in action. Each tab open consumes one CPU core, allowing users to control their commitment by choosing how many tabs to keep open. All of those numbers update about twice per second, so users can get a concrete idea of what’s going on. I think it’s fun to watch.
(I’m obviously a fan of blues and greens on my web pages. I don’t know why.)
I posted the server’s URL on reddit in the challenge thread, so various reddit users from around the world joined in on the computation.
I had an accidental discovery with strict mode and Chrome. I’ve always figured using strict mode had an effect on the performance of code, but had no idea how much. From the beginning, I had intended to use it in my worker script. Being isolated already, there are absolutely no downsides.
However, while I was developing and experimenting I accidentally
turned it off and left it off. It was left turned off for a short time
in the version I distributed to the clients, so I got to see how
things were going without it. When I noticed the mistake and
uncommented the "use strict"
line, I saw a 6-fold speed boost in
Chrome. Wow! Just making those few promises to Chrome allowed it to
make some massive performance optimizations.
With Chrome moving at full speed, it was able to inspect 560 keys per second on Brian’s laptop. I was getting about 300 keys per second on my own (less-capable) computers. I haven’t been able to get anything close to these speeds in any other language/platform (but I didn’t try in C yet).
Furthermore, I got a noticeable speed boost in Chrome by using proper object oriented programming, versus a loose collection of functions and ad-hoc structures. I think it’s because it made me construct my data structures consistently, allowing V8’s hidden classes to work their magic. It also probably helped the compiler predict type information. I’ll need to investigate this further.
Use strict mode whenever possible, folks!
Having web workers available was a big help. However, this problem met the original constraints fairly well.
It was low bandwidth. No special per-client instructions were required. The client only needed to report back a 26-character string.
There was no state to worry about. The original version of my script tried keys at random. The later version used a hill-climbing algorithm, so there was some state but it was only needed for a few seconds at a time. It wasn’t worth holding onto.
This project was a lot of fun so I hope I get another opportunity to do it again in the future, hopefully with a lot more nodes participating.
]]>However, one major application remained and I was really itching to capture its configuration too, since even my web browser is part of the experience. I could drop my dotfiles into a new computer within minutes and be ready to start hacking, except for my desktop environment. This was still a tedious, manual step, plagued by the configuration propagation issue. I wouldn’t to get too fancy with keybindings since I couldn’t rely on them being everywhere.
The problem was I was using KDE at the time and KDE’s configuration isn’t really version-friendly. Some of it is binary, making it unmergable, it doesn’t play well between different versions, and it’s unclear what needs to be captured and what can be ignored.
I wasn’t exactly a happy KDE user and really felt no attachment to it. I had only been using it a few months. I’ve used a number of desktops since 2004, the main ones being Xfce (couple years), IceWM (couple years), xmonad (8 months), and Gnome 2 (the rest of the time). Gnome 2 was my fallback, the familiar environment where I could feel at home and secure — that is, until Gnome 3 / Unity. The coming of Gnome 3 marked the death of Gnome 2. It became harder and harder to obtain version 2 and I lost my fallback.
I gave Gnome 3 and Unity each a couple of weeks but I just couldn’t stand them. Unremovable mouse hotspots, all new alt-tab behavior, regular crashing (after restoring old alt-tab behavior), and extreme unconfigurability even with a third-party tweak tool. I jumped for KDE 4, hoping to establish a comfortable fallback for myself.
KDE is pretty and configurable enough for me to get work done. There’s a lot of bloat (“activities” and widgets), but I can safely ignore it. The areas where it’s lacking didn’t bother me much, like the inability/non-triviality of custom application launchers.
My short time with Gnome 3 and now with KDE 4 did herald a new, good
change to my habits: keyboard application launching. I got used to
using the application menu to type my application name and launch
it. I did use dmenu during my xmonad trial, but I didn’t quite make
a habit out of it. It was also on a slower computer, slow enough for
dmenu to be a problem. For years I was just launching things from a
terminal. However, the Gnome and KDE menus both have a big common
annoyance. If you want to add a custom item, you need to write a
special desktop file and save it to the right location. Bleh! dmenu
works right off your PATH
— the way it should work — so no
special work needed.
Gnome 2 has been revived with a fork called MATE, but with the lack of a modern application launcher, I’m now too spoiled to be interested. Plus I wanted to find a suitable environment that I could integrate with my dotfiles repository.
After being a little embarrassed at Luke’s latest Show Me Your Desktop (what kind of self-respecting Linux geek uses a heavyweight desktop?!) I shopped around for a clean desktop environment with a configuration that would version properly. Perhaps I might find that perfect desktop environment I’ve been looking for all these years, if it even exists. It wasn’t too long before I ended up in Openbox. I’m pleased to report that I’m exceptionally happy with it.
Its configuration is two XML files and a shell script. The XML can be generated by a GUI configuration editor and/or edited by hand. The GUI was nice for quickly seeing what Openbox could do when I first logged into it, so I did use it once and find it useful. The configuration is very flexible too! I created keyboard bindings to slosh windows around the screen, resize them, move them across desktops, maximize in only one direction, change focus in a direction, and launch specific applications (for example super-n launches a new terminal window). It’s like the perfect combination of tiling and stacking window managers. Not only is it more configurable than KDE, but it’s done cleanly.
Openbox is pretty close to the perfect environment I want. There are still some annoying little bugs, mostly related to window positioning, but they’ve mostly been fixed. The problem is that they haven’t made an official release for a year and a half, so these fixes aren’t yet available. I might normally think to myself, “Why haven’t I been using Openbox for years?” but I know better than that. Versions of Openbox from just two years ago, like the one in Debian Squeeze (the current stable), aren’t very good. So I haven’t actually been missing out on anything. This is something really new.
I’m not using a desktop environment on top of Openbox, so there are no
panels or any of the normal stuff. This is perfectly fine for me; I
have better things to spend that real estate on. I am using a window
composite manager called xcompmgr
to make things pretty through
proper transparency and subtle drop shadows. Without panels, there
were a couple problems to deal with. I was used to my desktop
environment performing removable drive mounting and wireless network
management for me, so I needed to find standalone applications to do
the job.
Removable filesystems can be mounted the old fashioned way, where I
create a mount point, find the device name, then mount the device on
the mount point as root. This is annoying and unacceptable after
experiencing automounting for years. I found two applications to do
this: Thunar, Xfce’s file manager; and pmount
, a somewhat-buggy
command-line tool.
I chose Wicd to do network management. It has both a GTK client and an
ncurses client, so I can easily manage my wireless network
connectivity with and without a graphical environment — something I
could have used for years now (goodbye iwconfig
)! Unfortunately Wicd
is rigidly inflexible, allowing only one network interface to be up at
a time. This is a problem when I want to be on both a wired and
wireless network at the same time. For example, sometimes I use my
laptop as a gateway between a wired and wireless network. In these
cases I need to shut down Wicd and go back to manual networking for
awhile.
The next issue was wallpapers. I’ve always liked having
natural landscape wallpapers. So far,
I could move onto a new computer and have everything functionally
working, but I’d have a blank gray background. KDE 4 got me used to
slideshow wallpaper, changing the landscape image to a new one every
10-ish minutes. For a few years now, I’ve made a habit of creating a
.wallpapers
directory in my home directory and dumping interesting
wallpapers in there as I come across them. When picking a new
wallpaper, or telling KDE where to look for random wallpapers, I’d
grab one from there. I’ve decided to continue this with my dotfiles
repository.
I wrote a shell script that uses feh
to randomly set the root
(wallpaper) image every 10 minutes. It gets installed in .wallpapers
from the dotfiles repository. Openbox runs this script in the
background when it starts. I don’t actually store the hundreds of
images in my repository. There’s a fetch.sh
that grabs them all from
Amazon S3 automatically. This is just another small step I take after
running the dotfiles install script. Any new images I throw in
.wallpaper
get put int the rotation, but only for that computer.
I’ve now got all this encoded into my configuration files and checked into my dotfiles repository. It’s incredibly satisfying to have this in common across each of my computers and to have it instantly available on any new installs. I’m that much closer to having the ideal (and ultimately unattainable) computing experience!
]]>So you want to make your own animated GIFs from a video clip? Well, it’s a pretty easy process that can be done almost entirely from the command line. I’m going to show you how to turn the clip into a GIF and add an image macro overlay. Like this,
The key tool here is going to be Gifsicle, a very excellent command-line tool for creating and manipulating GIF images. So, the full list of tools is,
Here’s the source video for the tutorial. It’s an awkward video my wife took of our confused cats, Calvin and Rocc.
My goal is to cut after Calvin looks at the camera, before he looks away. From roughly 3 seconds to 23 seconds. I’ll have mplayer give me the frames as JPEG images.
mplayer -vo jpeg -ss 3 -endpos 23 -benchmark calvin-dummy.webm
This tells mplayer to output JPEG frames between 3 and 23 seconds,
doing it as fast as it can (-benchmark
). This output almost 800
images. Next I look through the frames and delete the extra images at
the beginning and end that I don’t want to keep. I’m also going to
throw away the even numbered frames, since GIFs can’t have such a high
framerate in practice.
rm *[0,2,4,6,8].jpg
There’s also dead space around the cats in the image that I want to crop. Looking at one of the frames in GIMP, I’ve determined this is a 450 by 340 box, with the top-left corner at (136, 70). We’ll need this information for ImageMagick.
Gifsicle only knows how to work with GIFs, so we need to batch convert
these frames with ImageMagick’s convert
. This is where we need the
crop dimensions from above, which is given in ImageMagick’s notation.
ls *.jpg | xargs -I{} -P4 \
convert {} -crop 450x340+136+70 +repage -resize 300 {}.gif
This will do four images at a time in parallel. The +repage
is
necessary because ImageMagick keeps track of the original image
“canvas”, and it will simply drop the section of the image we don’t
want rather than completely crop it away. The repage forces it to
resize the canvas as well. I’m also scaling it down slightly to save
on the final file size.
We have our GIF frames, so we’re almost there! Next, we ask Gifsicle to compile an animated GIF.
gifsicle --loop --delay 5 --dither --colors 32 -O2 *.gif > ../out.gif
I’ve found that using 32 colors and dithering the image gives very
nice results at a reasonable file size. Dithering adds noise to the
image to remove the banding that occurs with small color palettes.
I’ve also instructed it to optimize the GIF as fully as it can
(-O2
). If you’re just experimenting and want Gifsicle to go faster,
turning off dithering goes a long way, followed by disabling
optimization.
The delay of 5 gives us the 15-ish frames-per-second we want — since we cut half the frames from a 30 frames-per-second source video. We also want to loop indefinitely.
The result is this 6.7 MB GIF. A little large, but good enough. It’s basically what I was going for. Next we add some macro text.
In GIMP, make a new image with the same dimensions of the GIF frames, with a transparent background.
Add your macro text in white, in the Impact Condensed font.
Right click the text layer and select “Alpha to Selection,” then under Select, grow the selection by a few pixels — 3 in this case.
Select the background layer and fill the selection with black, giving a black border to the text.
Save this image as text.png, for our text overlay.
Time to go back and redo the frames, overlaying the text this time. This is called compositing and ImageMagick can do it without breaking a sweat. To composite two images is simple.
convert base.png top.png -composite out.png
List the image to go on top, then use the -composite
flag, and it’s
placed over top of the base image. In my case, I actually don’t want
the text to appear until Calvin, the orange cat, faces the camera.
This happens quite conveniently at just about frame 500, so I’m only
going to redo those frames.
ls 000005*.jpg | xargs -I{} -P4 \
convert {} -crop 450x340+136+70 +repage \
-resize 300 text.png -composite {}.gif
Run Gifsicle again and this 6.2 MB image is the result. The text overlay compresses better, so it’s a tiny bit smaller.
Now it’s time to post it on reddit and reap that tasty, tasty karma. (Over 400,000 views!)
]]>Write a program that simulates the spreading of a rumor among a group of people. At any given time, each person in the group is in one of three categories:
- IGNORANT - the person has not yet heard the rumor
- SPREADER - the person has heard the rumor and is eager to spread it
- STIFLER - the person has heard the rumor but considers it old news and will not spread it
At the very beginning, there is one spreader; everyone else is ignorant. Then people begin to encounter each other.
So the encounters go like this:
- If a SPREADER and an IGNORANT meet, IGNORANT becomes a SPREADER.
- If a SPREADER and a STIFLER meet, the SPREADER becomes a STIFLER.
- If a SPREADER and a SPREADER meet, they both become STIFLERS.
- In all other encounters nothing changes.
Your program should simulate this by repeatedly selecting two people randomly and having them “meet.”
There are three questions we want to answer:
- Will everyone eventually hear the rumor, or will it die out before everyone hears it?
- If it does die out, what percentage of the population hears it?
- How long does it take? i.e. How many encounters occur before the rumor dies out?
I wrote a very thorough version to produce videos of the simulation in action.
It accepts some command line arguments, so you don’t need to edit any code just to try out some simple things.
And here are a couple of videos. Each individual is a cell in a 2D grid. IGNORANT is black, SPREADER is red, and STIFLER is white. Note that this is not a cellular automata, because cell neighborship does not come into play.
Here’s are the statistics for ten different rumors.
Rumor(n=10000, meetups=132380, knowing=0.789)
Rumor(n=10000, meetups=123944, knowing=0.7911)
Rumor(n=10000, meetups=117459, knowing=0.7985)
Rumor(n=10000, meetups=127063, knowing=0.79)
Rumor(n=10000, meetups=124116, knowing=0.8025)
Rumor(n=10000, meetups=115903, knowing=0.7952)
Rumor(n=10000, meetups=137222, knowing=0.7927)
Rumor(n=10000, meetups=134354, knowing=0.797)
Rumor(n=10000, meetups=113887, knowing=0.8025)
Rumor(n=10000, meetups=139534, knowing=0.7938)
Except for very small populations, the simulation always terminates very close to 80% rumor coverage. I don’t understand (yet) why this is, but I find it very interesting.
]]>I found this Common Lisp Quick Reference the other day from r/lisp, and I think it's fantastic. It's a comprehensive, libre booklet of the symbols defined by the Common Lisp ANSI standard. Very slick!
The main version is meant to be printed out and nested with a vertical fold, and it works quite well. If I ever get a chance to use Common Lisp at work (a man can dream), probably at a location without Internet access, this could come in handy. So I printed out one for myself,
]]>