Recently I’ve been experimenting with freestanding C programs on
Windows. Freestanding refers to programs that don’t link, either
statically or dynamically, against a standard library (i.e. libc).
This is typical for operating systems and similar, bare metal
situations. Normally a C compiler can make assumptions about the
semantics of functions provided by the C standard library. For
example, the compiler will likely replace a call to a small,
fixed-size memmove() with move instructions. Since a freestanding
program would supply its own, it may have different semantics.
My usual go to for C/C++ on Windows is Mingw-w64, which has
greatly suited my needs the past couple of years. It’s packaged on
Debian, and, when combined with Wine, allows me to fully develop
Windows applications on Linux. Being GCC, it’s also great for
cross-platform development since it’s essentially the same compiler as
the other platforms. The primary difference is the interface to the
operating system (POSIX vs. Win32).
However, it has one glaring flaw inherited from MinGW: it links
against msvcrt.dll, an ancient version of the Microsoft C runtime
library that currently ships with Windows. Besides being dated and
quirky, it’s not an official part of Windows and never has
been, despite its inclusion with every release since Windows 95.
Mingw-w64 doesn’t have a C library of its own, instead patching over
some of the flaws of msvcrt.dll and linking against it.
Since so much depends on msvcrt.dll despite its unofficial nature,
it’s unlikely Microsoft will ever drop it from future releases of
Windows. However, if strict correctness is a concern, we must ask
Mingw-w64 not to link against it. An alternative would be
PlibC, though the LGPL licensing is unfortunate. Another is
Cygwin, which is a very complete POSIX environment, but is heavy and
Sometimes I’d prefer to be more direct: skip the C library altogether
and talk directly to the operating system. On Windows that’s the Win32
API. Ultimately I want a tiny, standalone .exe that only links against
Linux vs. Windows
The most important benefit of a standard library like libc is a
portable, uniform interface to the host system. So long as the
standard library suits its needs, the same program can run anywhere.
Without it, the programs needs an implementation of each
On Linux, operating system requests at the lowest level are made
directly via system calls. This requires a bit of assembly language
for each supported architecture (int 0x80 on x86, syscall on
x86_64, swi on ARM, etc.). The POSIX functions of the various Linux
libc implementations are built on top of this mechanism.
For example, here’s a function for a 1-argument system call on x86_64.
The situation is simpler on Windows. Its low level system calls are
undocumented and unstable, changing across even minor updates. The
formal, stable interface is through the exported functions in
kernel32.dll. In fact, kernel32.dll is essentially a standard library
on its own (making the term “freestanding” in this case dubious). It
includes functions usually found only in user-space, like string
manipulation, formatted output, font handling, and heap management
(similar to malloc()). It’s not POSIX, but it has analogs to much of
the same functionality.
The standard entry for a C program is main(). However, this is not
the application’s true entry. The entry is in the C library, which
does some initialization before calling your main(). When main()
returns, it performs cleanup and exits. Without a C library, programs
don’t start at main().
On Linux the default entry is the symbol _start. It’s prototype
would look like so:
Returning from this function leads to a segmentation fault, so it’s up
to your application to perform the exit system call rather than
On Windows, the entry depends on the type of application. The two
relevant subsystems today are the console and windows subsystems.
The former is for console applications (duh). These programs may still
create windows and such, but must always have a controlling console.
The latter is primarily for programs that don’t run in a console,
though they can still create an associated console if they like. In
Mingw-w64, give -mconsole (default) or -mwindows to the linker to
choose the subsystem.
Unlike Linux’s _start, Windows programs can safely return from these
functions, similar to main(), hence the int return. The WINAPI
macro means the function may have a special calling convention,
depending on the platform.
On any system, you can choose a different entry symbol or address
using the --entry option to the GNU linker.
One problem I’ve run into is Mingw-w64 generating code that calls
__chkstk_ms() from libgcc. I believe this is a long-standing bug,
since -ffreestanding should prevent these sorts of helper functions
from being used. The workaround I’ve found is to disable stack
Notice I manually linked against kernel32.dll. The stripped final
result is only 4kB, mostly PE padding. There are techniques to trim
this down even further, but for a substantial program it
wouldn’t make a significant difference.
From here you could create a GUI by linking against user32.dll and
gdi32.dll (both also part of Win32) and calling the appropriate
functions. I already ported my OpenGL demo to a freestanding
.exe, dropping GLFW and directly using Win32 and WGL. It’s much less
portable, but the final .exe is only 4kB, down from the original 104kB
(static linking against GLFW).
It’s been two years since I last wrote about Elfeed, my
Atom/RSS feed reader for Emacs. I’ve used it every single
day since, and I continue to maintain it with help from the community.
So far 18 people besides me have contributed commits. Over the last
couple of years it’s accumulated some new features, some more obvious
Every time I mark a new release, I update the ChangeLog at the top of
elfeed.el which lists what’s new. Since it’s easy to overlook many of
the newer useful features, I thought I’d list the more important ones
Custom Entry Colors
You can now customize entry faces through elfeed-search-face-alist.
This variable maps tags to faces. An entry inherits the face of any
tag it carries. Previously “unread” was a special tag that got a bold
face, but this is now implemented as nothing more than an initial
entry in the alist.
I’ve been using it to mark different kinds of content (videos,
podcasts, comics) with different colors.
You can specify the starting tags for entries from particular feeds
directly in the feed listing. This has been a feature for awhile now,
but it’s not something you’d want to miss. It started out as a feature
in my personal configuration that eventually migrated into Elfeed
For example, your elfeed-feeds may initially look like this,
especially if you imported from OPML.
If you wanted certain tags applied to entries from each, you would
need to putz around with elfeed-make-tagger. For the most common
case — apply certain tags to all entries from a URL — it’s much
simpler to specify the information as part of the listing itself,
Today I only use custom tagger functions in my own configuration to
filter within a couple of particularly noisy feeds.
Metadata is more for Elfeed extensions (i.e. elfeed-org)
than regular users. You can attach arbitrary, readable
metadata to any Elfeed object (entry, feed). This metadata is
automatically stored in the database. It’s a plist.
Metadata is accessed entirely through one setf-able function:
elfeed-meta. For example, you might want to track when you’ve read
something, not just that you’ve read it. You could use this to
selectively update certain feeds or just to evaluate your own habits.
Two things motivated this feature. First, without a plist, if I added
more properties in the future, I would need to change the database
format to support them. I modified the database format to add
metadata, requiring an upgrade function to quietly upgrade older
databases as they were loaded. I’d really like to avoid this in the
Second, I wanted to make it easy for extension authors to store their
own data. I still imagine an extension someday to update feeds
intelligently based on their history. For example, the database
doesn’t track when the feed was last fetched, just the date of the
most recent entry (if any). A smart-update extension could use
metadata to tag feeds with this information.
Elfeed itself already uses two metadata keys: :failures on feeds and
:title on both. :failures counts the total number of times
fetching that feed resulted in an error. You could use this get a
listing of troublesome feeds like so,
The :title property allows for a custom title for both feeds and
entries in the search buffer listing, assuming you’re using the
default function (see below). It overrides the title provided by the
feed itself. This is different than elfeed-entry-title and
elfeed-feed-title, which is kept in sync with feed content. Metadata
is not kept in sync with the feed itself.
You can invert filter components by prefixing them with !. For
example, say you’re looking at all my posts from the past 6 months:
But say you’re tired of me and decide you want to see every entry from
the past 6 months excluding my posts.
Normally you limit the number of results by date, but you can now
limit the result by count using #n. For example, to see my most
recent 12 posts regardless of date,
This is used internally in the live filter to limit the number of
results to the height of the screen. If you noticed that live
filtering has been much more responsive in the last few months, this is
Elfeed properly integrates with Emacs’ bookmarks (thanks to
groks). You can bookmark the current filter with M-x
bookmark-set (C-x r m). By default, Emacs will persist bookmarks
between sessions. To revisit a filter in the future, M-x
bookmark-jump (C-x r b).
Since this requires no configuration, this may serve as an easy
replacement for manually building “view” toggles — filters bound to
certain keys — which I know many users have done, including me.
If you’ve updated very recently, you probably noticed Elfeed got a
brand new header. Previously it faked a header by writing to the first
line of the buffer. This is because somehow I had no idea Emacs had
official support for buffer headers (despite notmuch using them all
The new header includes additional information, such as the current
filter, the number of unread entries, the total number of entries, and
the number of unique feeds currently in view. You’ll see this as
<unread>/<total>:<feeds> in the middle of the header.
As of this writing, the new header has not been made part of a formal
release. So if you’re only tracking stable releases, you won’t see
this for awhile longer.
As you already know, in the search buffer listing you can press G to
update your feeds. But did you know you it takes a prefix argument?
Run as C-u G, it only updates feeds with entries currently listed in
As of this writing, this is another feature not yet in a formal
release. I’d been wanting something like this for awhile but couldn’t
think of a reasonable interface. Directly prompting the user for feeds
is neither elegant nor composable. However, groks suggested the
prefix argument, which composes perfectly with Elfeed’s
In addition to custom faces, there are a number of ways to customize
Choose the sort order with elfeed-sort-order.
Set a custom date format with elfeed-search-date-format.
Adjust field widths with elfeed-search-*-width.
Or override everything with elfeed-search-print-entry-function.
Gergely Nagy has been throwing lots of commits at me over the last
couple of weeks to open up lots of Elfeed’s behavior to customization,
so there are more to come.
Thank You, Emacs Community
Apologies about any features I missed or anyone I forgot to mention
who’s made contributions. The above comes from my ChangeLogs, the
commit log, the GitHub issue listing, and my own memory, so I’m likely
to have forgotten some things. A couple of these features I had
forgotten about myself!
To use it, you’ll need Poppler’s pdftotext command line
program — used to build an index of the PDF — and a copy of the
complete Volume 2 of Intel’s instruction set manual. There’s only one
command to worry about: M-x x86-lookup.
Minimize documentation friction
This package should be familiar to anyone who’s used
javadoc-lookup, one of my older packages. It has a common
underlying itch: the context switch to read API documentation while
coding should have as little friction as possible, otherwise I’m
discouraged from doing it. In an ideal world I wouldn’t ever need to
check documentation because it’s already in my head. By visiting
documentation frequently with ease, it’s going to become familiar that
much faster and I’ll be reaching for it less and less, approaching the
I picked up x86 assembly [about a year ago][x86] and for the first few
months I struggled to find a good online reference for the instruction
set. There are little scraps here and there, but not much of
substance. The big exception is Félix Cloutier’s reference,
which is an amazingly well-done HTML conversion of Intel’s PDF
manuals. Unfortunately I could never get it working locally to
generate my own. There’s also the X86 Opcode and Instruction
Reference, but it’s more for machines than humans.
Besides, I often work without an Internet connection, so offline
documentation is absolutely essential. (You hear that Microsoft? Not
only do I avoid coding against Win32 because it’s badly designed, but
even more so because you don’t offer offline documentation anymore!
The friction to API reference your documentation is enormous.)
I avoided the official x86 documentation for awhile, thinking it would
be too opaque, at least until I became more accustomed to the
instruction set. But really, it’s not bad! With a handle on the
basics, I would encourage anyone to dive into either Intel’s or AMD’s
manuals. The reason there’s not much online in HTML form is
because these manuals are nearly everything you need.
I chose Intel’s manuals for x86-lookup because I’m more familiar with
it, it’s more popular, it’s (slightly) easier to parse, it’s offered
as a single PDF, and it’s more complete. The regular expression for
finding instructions is tuned for Intel’s manual and it won’t work
with AMD’s manuals.
For a couple months prior to writing x86-lookup, I had a couple of
scratch functions to very roughly accomplish the same thing. The
tipping point for formalizing it was that last month I wrote my own
x86 assembler. A single mnemonic often has a dozen or more different
opcodes depending on the instruction’s operands, and there are often
several ways to encode the same operation. I was frequently looking up
opcodes, and navigating the PDF quickly became a real chore. I only
needed about 80 different opcodes, so I was just adding them to the
assembler’s internal table manually as needed.
How does it work?
Say you want to look up the instruction RDRAND.
Initially Emacs has no idea what page this is on, so the first step is
to build an index mapping mnemonics to pages. x86-lookup runs the
pdftotext command line program on the PDF and loads the result into a
The killer feature of pdftotext is that it emits FORM FEED (U+0012)
characters between pages. Think of these as page breaks. By counting
form feed characters, x86-lookup can track the page for any part of
the document. In fact, Emacs is already set up to do this with its
forward-page and backward-page commands. So to build the index,
x86-lookup steps forward page-by-page looking for mnemonics, keeping
note of the page. Since this process typically takes about 10 seconds,
the index is cached in a file (see x86-lookup-cache-directory) for
future use. It only needs to happen once for a particular manual on a
The mnemonic listing is slightly incomplete, so x86-lookup expands
certain mnemonics into the familiar set. For example, all the
conditional jumps are listed under “Jcc,” but this is probably not
what you’d expect to look up. I compared x86-lookup’s mnemonic listing
against NASM/nasm-mode’s mnemonics to ensure everything was accounted
for. Both packages benefited from this process.
Once the index is built, pdftotext is no longer needed. If you’re
desperate and don’t have this program available, you can borrow the
index file from another computer. But you’re on your own for figuring
So to look up RDRAND, x86-lookup checks the index for the page number
and invokes a PDF reader on that page. This is where not all PDF
readers are created equal. There’s no convention for opening a PDF to
a particular page and each PDF reader differs. Some don’t even support
it. To deal with this, x86-lookup has a function specialized for
different PDF readers. Similar to browse-url-browser-function,
x86-lookup has x86-lookup-browse-pdf-function.
By default it tries to open the PDF for viewing within Emacs (did you
know Emacs is a PDF viewer?), falling back to on options if the
feature is unavailable. I welcome pull requests for any PDF readers
not yet supported by x86-lookup. Perhaps this functionality deserves
its own package.
That’s it! It’s a simple feature that has already saved me a lot of
time. If you’re ever programming in x86 assembly, give x86-lookup a
Emacs comes with a wonderful arbitrary-precision computer algebra
system called calc. I’ve discussed it previously and
continue to use it on a daily basis. That’s right, people, Emacs can
do calculus. Like everything Emacs, it’s programmable and extensible
from Emacs Lisp. In this article, I’m going to implement the RSA
public-key cryptosystem in Emacs Lisp using calc.
If you want to dive right in first, here’s the repository:
This is only a toy implementation and not really intended for serious
cryptographic work. It’s also far too slow when using keys of
Evaluation with calc
The calc package is particularly useful when considering Emacs’
limited integer type. Emacs uses a tagged integer scheme where
integers are embedded within pointers. It’s a lot faster than the
alternative (individually-allocated integer objects), but it means
they’re always a few bits short of the platform’s native integer type.
calc has a large API, but the user-friendly porcelain for it is the
under-documented calc-eval function. It evaluates an expression
string with format-like argument substitutions ($n).
Notice it returns strings, which is one of the ways calc represents
arbitrary precision numbers. For arguments, it accepts regular Elisp
numbers and strings just like this function returns. The implicit
radix is 10. To explicitly set the radix, prefix the number with the
radix and #. This is the same as in the user interface of calc. For
(calc-eval"16#deadbeef");; => "3735928559"
The second argument (optional) to calc-eval adjusts its behavior.
Given nil, it simply evaluates the string and returns the result.
The manual documents the different options, but the only other
relevant option for RSA is the symbol pred, which asks it to return
a boolean “predicate” result.
(calc-eval"$1 < $2"'pred"4000""5000");; => t
RSA is founded on the difficulty of factoring large composites with
large factors. Generating an RSA keypair starts with generating two
prime numbers, p and q, and using these primes to compute two
mathematically related composite numbers.
calc has a function calc-next-prime for finding the next prime
number following any arbitrary number. It uses a probabilistic
primarily test — the Fermat Miller-Rabin primality test
— to efficiently test large integers. It increments the input until
it finds a result that passes enough iterations of the primality test.
Unfortunately calc’s random function is based on Emacs’ random
function, which is entirely unsuitable for cryptography. In the real
implementation I read n bits from /dev/urandom to generate an n-bit
From here the code just follows along from the Wikipedia article.
After generating the primes p and q, two composites are computed,
n = p * q and i = (p - 1) * (q - 1). Lacking any reason to do
otherwise, I chose 65,537 for the public exponent e.
The function rsa--inverse is just a straight Emacs Lisp + calc
implementation of the extended Euclidean algorithm from the Wikipedia
article pseudocode, computing d ≡ e^-1 (mod i). It’s not much
use sharing it here, so take a look at the repository if you’re
(defunrsa-generate-keypair(bits)"Generate a fresh RSA keypair plist of BITS length."(let*((p(rsa-generate-prime(+1(/bits2))))(q(rsa-generate-prime(+1(/bits2))))(n(calc-eval"$1 * $2"nilpq))(i(calc-eval"($1 - 1) * ($2 - 1)"nilpq))(e(calc-eval"2^16+1"))(d(rsa--inverseei)))`(:public(:n,n:e,e):private(:n,n:d,d))))
The public key is n and e and the private key is n and d. From
here we can compute and verify cryptographic signatures.
To compute signature s of an integer m (where m < n), compute
s ≡ m^d (mod n). I chose the right-to-left binary method, again
straight from the Wikipedia pseudocode (lazy!). I’ll share this
one since it’s short. The backslash denotes integer division.
Verifying the signature is the same process, but with the public key’s
e: m ≡ s^e (mod n). If the signature is valid, m will be
recovered. In theory, only someone who knows d can feasibly compute
s from m. If n is small enough to factor, revealing
p and q, then d can be feasibly recomputed from the public key.
So mind your Ps and Qs.
So that leaves one problem: generally users want to sign strings and
files and such, not integers. A hash function is used to reduce an
arbitrary quantity of data into an integer suitable for signing. Emacs
comes with a bunch of them, accessible through secure-hash. It
hashes strings and buffers.
Each of these operations took less than a second. For larger,
secure-length keys, this implementation is painfully slow. For
example, generating a 2048-bit key takes my laptop about half an hour,
and computing a signature with that key (any size message) takes about
a minute. That’s probably a little too slow for, say, signing ELPA
One of the great advantages of dependency analysis is parallelization.
Modern processors reorder instructions whose results don’t affect each
other. Compilers reorder expressions and statements to improve
throughput. Build systems know which outputs are inputs for other
targets and can choose any arbitrary build order within that
constraint. This article involves the last case.
The build system I use most often is GNU Make, either directly or
indirectly (Autoconf, CMake). It’s far from perfect, but it does what
I need. I almost always invoke it from within Emacs rather than in a
terminal. In fact, I do it so often that I’ve wrapped Emacs’ compile
command for rapid invocation.
I recently helped a co-worker set this set up for himself, so it had
me thinking about the problem again. The situation in my
config is much more complicated than it needs to be, so I’ll
share a simplified version instead.
First bring in the usual goodies (we’re going to be making closures):
Then a couple of interactive functions to set these on the fly. It’s
not strictly necessary, but I like giving each a key binding. I also
like having a history available via read-string, so I can switch
between a couple of different options with ease.
Now finally to the good part. Below, quick-compile is a
non-interactive function that returns an interactive closure ready to
be bound to any key I desire. It takes an optional target. This means
I don’t use the above quick-compile-set-command to choose a target,
only for setting other options. That will make more sense in a moment.
(cl-defunquick-compile(&optional(target""))"Return an interaction function that runs `compile' for TARGET."(lambda()(interactive)(save-buffer); so I don't get asked(let((default-directory(locate-dominating-filedefault-directoryquick-compile-build-file)))(ifdefault-directory(compile(concatquick-compile-command" "target))(error"Cannot find %s"quick-compile-build-file)))))
It traverses up (down?) the directory hierarchy towards root looking
for a Makefile — or whatever is set for quick-compile-build-file
— then invokes the build system there. I don’t believe in recursive
So how do I put this to use? I clobber some key bindings I don’t
otherwise care about. A better choice might be the F-keys, but my
muscle memory is already committed elsewhere.
Each of those invokes a different target without second guessing me.
Let me tell you, having “clean” at the tip of my fingers is wonderful.
An extension common to many different make programs is -j, which
asks make to build targets in parallel where possible. These days
where multi-core machines are the norm, you nearly always want to use
this option, ideally set to the number of logical processor cores on
your system. It’s a huge time-saver.
My recent revelation was that my default build command could be
better: make -k is minimal. It should at least include -j, but
choosing an argument (number of processor cores) is a problem. Today I
use different machines with 2, 4, or 8 cores, so most of the time any
given number will be wrong. I could use a per-system configuration,
but I’d rather not. Unfortunately GNU Make will not automatically
detect the number of cores. That leaves the matter up to Emacs Lisp.
Emacs doesn’t currently have a built-in function that returns the
number of processor cores. I’ll need to reach into the operating
system to figure it out. My usual development environments are Linux,
Windows, and OpenBSD, so my solution should work on each. I’ve ranked
them by order of importance.
Number of cores on Linux
Linux has the /proc virtual filesystem in the fashion of Plan 9,
allowing different aspects of the system to be explored through the
standard filesystem API. The relevant file here is /proc/cpuinfo,
listing useful information about each of the system’s processors. To
get the number of processors, count the number of processor entries in
this file. I’ve wrapped it in if-file-exists so that it returns
nil on other operating systems instead of throwing an error.
When I was first researching how to do this on Windows, I thought I
would need to invoke the wmic command line program and hope the
output could be parsed the same way on different versions of the
operating system and tool. However, it turns out the solution for
Windows is trivial. The environment variable NUMBER_OF_PROCESSORS
gives every process the answer for free. Being an environment
variable, it will need to be parsed.
My wife and I are huge webcomic fans. The web is the medium that the
comic strip industry needed badly for decades, and, with Patreon and
such today, we’re now living in a golden age of comics. As of this
writing, I currently follow … let’s see … 39 different web comics.
My first exposure to comics was in my childhood when I got my hands on
Bill Watterson’s Something Under the Bed Is Drooling (Calvin and
Hobbes). This gave me very high expectations of the Sunday comics
section of the newspaper when I’d read it at my grandmother’s house.
Those hopes were shattered as I discovered just how awful nationally
syndicated comic strips are: mostly watered down, lowest common
denominator stuff like Garfield, Family Circus, Cathy, B.C., etc.
During Calvin and Hobbes’s original run, Bill Watterson wrote about
his struggles with the newspapers and the Universal Press Syndicate,
one of the organizations responsible for this mess. Newspapers and the
Syndicate pushed for smaller frames and shorter comics. Authors were
required to plan around newspapers removing frames for layout
purposes. Many newspapers would drop comics that need meet stringent
content limitations — a line that even Calvin and Hobbes crossed on
occasion. Authors had little control over how their work was
Those days are over. Today’s authors can cheaply host their comics on
the web — webcomics — with full control over content, layout,
and schedule. If they even try to monetize at all, it’s generally
through advertising, merchandising, or reader donations. Some do it
all in their free time, while for others it’s part or even full time
employment. The number of regular readers of a single webcomic can be
just a handful of people, or up to millions of people. The role of the
middleman is somewhere between diminished to non-existent. This is
great, because newspapers would never publish the vast majority of
the comics I read every day.
I’ve been fortunate to meet a couple of my favorite webcomic authors.
Here’s a picture of my wife posing with Anthony Clark of Nedroid
Picture Diary at the Small Press Expo.
I’ve also met Philippa Rice of My Cardboard Life. (Sorry,
no picture for this one, since taking pictures with people isn’t
really my thing.)
Over the years I’ve seen webcomic authors blunder with the web as a
technology. In my experience it’s been disproportionate, with mistakes
made more often by them than the bloggers I follow. I suspect that
this is because blogs I follow tend to be computing related and so
their authors have high proficiency in computing. The same is not
necessarily true of the webcomics I follow.
Tips for web authors
Since I want to see this medium continue to thrive, and to do so in a
way friendly to my own preferences, I’d like to share some tips to
avoid common mistakes. Some of these apply more broadly than
If you’re using a host designed for webcomics or similar, such as
Tumblr, a lot of this stuff will be correct by default without any
additional work on your part. However, you should still be aware of
common problems because you may unwittingly go out of your way to
URLs are forever
Every time you publish on the web, your content is accessible through
some specific URL: that sequence of characters that starts with
“http”. Each individual comic should be accessible through a unique,
unchanging URL. That last adjective is critically important. That
URL should point to the same comic for as long as possible — ideally
until the heat death of the universe. This will be affected by
problems such as your host going down, but the impact should only be
temporary and short. A URL is a promise.
People will be using this URL to share your comics with others.
They’ll make posts on other websites linking to your comic. They’ll
e-mail that URLs to friends and family. Once you’ve published, you no
longer control how that URL is used.
On several occasions I’ve seen authors break all their URLs after
revamping their site. For example, the previously the URL contained
the date but the new URL is only the domain and the title. That breaks
thousands of links all over the Internet. Visitors using those old
links will be welcomed with an ugly “404 Not Found” — or worse, as
I’ve seen more than once, a “200 Found” blank page. These are missed
opportunities for new readers.
If you really must change your URLs, the next best thing is to use
an HTTP “301 Moved Permanently” and redirect to the new URL. This will
leave all those old links intact and encourage new links to use the
new address. If you don’t know how this works, ask your local computer
geek about it.
You should also avoid having multiple URLs for the same content
without a redirect. Search engines will punish you for it and it’s
confusing for users. Pick one URL as the canonical URL for a comic,
and if you’ve published any other URLs (short URLs, etc.), use the
previously mentioned “301 Moved Permanently” to redirect to the
Your main page probably lists all your comics starting from the most
recent. This is a good design and doesn’t violate anything I
previously said. That’s not the URL for any particular comic, but to
the main page, which also serves as the list of recent comics. I
strongly recommend that the comics on the main page are also
hyperlinks to their specific URL. Users naturally expect to find the
comic’s URL by clicking on the comic’s image.
Have an Atom or RSS feed
Comics without feeds is much less of a problem than it used to
be, but it still comes up on occasion. If you need to pick
between Atom and RSS, I recommend Atom, but, honestly, it’s only
important that you have a valid feed with a date. You don’t even need
to put the comic in the feed itself (possibly costing you ad revenue),
just a link to the comic’s URL is fine. It’s main purpose is to say,
“hey, there’s a new comic up!”
You may not use Atom/RSS yourself, but your readers will appreciate
it. Many of us don’t use centralized services like Facebook, Twitter,
or Google+, and want to follow your work without signing up for a
third-party service. Atom/RSS is the widely-accepted decentralized
method for syndication on the web.
Web feeds are really easy; it’s just an XML file on your website that
lists the most recent content. A validator can help you
ensure you’ve done it correctly.
Pick a good, catchy title
One of the biggest barriers to sharing a comic is a lack of title. For
example, if a reader is going to post your comic on reddit, they need
to enter the comic’s URL and its title. If the comic doesn’t have a
title, then this person will need to make one up. There’s two problems
Coming up with a title is work. Work discourages sharing. The reason
you publish your comic is probably because you want lots of people
to see it. If this is true, you want sharing to be as easy as
You really don’t want readers choosing titles for you, especially
while they’re impatiently trying to share your work. If the comic is
shared in multiple places, it will end up with a different
reader-made title at each.
At minimum your title should appear in the <title> element of the
page so that it shows up in the browser tab and browser’s window
title. The title of the individual comic should come before the title
of the whole website, since that shows up better in search engines.
The title should also appear somewhere near the top of page for easy
clipboard copying, though it may be worth leaving out depending on the
style of your comic.
A page without a <title> element looks amateur, so don’t do that!
Think of the future and include dates
This is one of those things that’s important anywhere on the web and
is often violated by blog articles as well. Far too much content is
published without a date. Dates put your comic in context, especially
if it’s about something topical. It also helps users navigate your
content though time.
Putting the date in the URL is sufficient — even preferred — if
you didn’t want to display it on the page proper. Your Atom/RSS should
always have the comic’s date. I personally benefit from a date-time
precision down to the publication hour. Some comics/articles are
always published as “midnight” even when posted in the afternoon,
which has the jarring effect of inserting it in time before a bunch of
things I’ve already read.
How do I contact you?
When I notice one of the previous problems, particularly when they
arise in comics I’m already following, I’d like to inform you of the
problem. Or perhaps I want to compliment you on a well-made comic and
you don’t have a comments section. I can only do this if you include
some sort of contact information. An e-mail address, even in an
anti-spam image form, is preferable but not strictly required.
Take advantage of the medium and go big
Comics published in newspapers are really tiny because newspaper
editors want to cram a bunch of them onto a couple of pages. You’re
not operating under these limitations, so fight the urge to copy that
familiar format. Your canvas is practically infinite, so make big,
colorful webcomics. The only limit is your readers’ screen resolution.
A final thanks
Thanks for all the work you do, webcomic authors. You regularly create
all this awesome stuff for free. If you’re a webcomic author and you
need help with any of the information above, don’t hesitate to contact
me. After all, I don’t hesitate to bug you when something’s not right!
I recently ran into a problem where long-running program
output was trapped in a C FILE buffer. The program had been running
for two days straight printing its results, but the last few kilobytes
of output were missing. It wouldn’t output these last bytes until the
program completed its day-long (or worse!) cleanup operation and
exited. This is easy to fix — and, honestly, the cleanup step was
unnecessary anyway — but I didn’t want to start all over and wait
two more days to recompute the result.
Here’s a minimal example of the situation. The first loop represents
the long-running computation and the infinite loop represents a
cleanup job that will never complete.
Both printf and putchar are C library functions and are usually
buffered in some way. That is, each call to these functions doesn’t
necessarily send data out of the program. This is in contrast to the
POSIX functions read and write, which are unbuffered system calls.
Since system calls are relatively expensive, buffered input and output
is used to change a large number of system calls on small buffers into
a single system call on a single large buffer.
Typically, stdout is line-buffered if connected to a terminal. When
the program completes a line of output, the user probably wants to see
it immediately. So, if you compile the example program and run it at
your terminal you will probably see the output before the program
hangs on the infinite loop.
However, when stdout is connected to a file or pipe, it’s generally
buffered to something like 4kB. For this program, the output will
remain empty no matter how long you wait. It’s trapped in a FILE
buffer in process memory.
$ ./a.out > output.txt
The primary way to fix this is to use the fflush function, to force
the buffer empty before starting a long, non-output operation.
Unfortunately for me I didn’t think of this two days earlier.
Debugger to the Rescue
Fortunately there is a way to interrupt a running program and
manipulate its state: a debugger. First, find the process ID of the
running program (the one writing to output.txt above).
$ pgrep a.out
Now attach GDB, which will pause the program’s execution.
$ gdb ./a.out
Reading symbols from ./a.out...(no debugging symbols found)...done.
gdb> attach 12934
Attaching to program: /tmp/a.out, process 12934
... snip ...
0x0000000000400598 in main ()
From here I could examine the stdout FILE struct and try to extract
the buffer contents by hand. However, the easiest thing is to do is
perform the call I forgot in the first place: fflush(stdout).
gdb> call fflush(stdout)
$1 = 0
Detaching from program: /tmp/a.out, process 12934
The program is still running, but the output has been recovered.
As I said, in my case the cleanup operation was entirely unnecessary,
so it would be safe to just kill the program at this point. It was
taking a really long time to tear down a humongous data structure (on
the order of 50GB) one little node at a time with free. Obviously,
the memory would be freed much more quickly by the OS when the program
Freeing memory in the program was only to satisfy Valgrind,
since it’s so incredibly useful for debugging. Not freeing the data
structure would hide actual memory leaks in Valgrind’s final report.
For the real “production” run, I should have disabled cleanup.
Last month I assembled and edited Shamus Young’s Twenty-Sided
Tale, originally a series of 84 blog articles, into an e-book.
The book is 75,000 words — about the average length of a novel —
recording the complete story of one of Shamus’ Dungeons and Dragons
campaigns. Since he’s shared the e-book on his blog, I’m now
free to pull back the curtain on this little project.
To build the book yourself, you will only need make and pandoc.
Why did I want this?
Ever since I got a tablet a couple years ago, I’ve
completely switched over to e-books. Prior to the tablet, if there was
an e-book I wanted to read, I’d have to read from a computer monitor
while sitting at a desk. Anyone who’s tried it can tell you it’s not a
comfortable way to read for long periods, so I only reserved the
effort for e-book-only books that were really worth it. However,
once comfortable with the tablet, I gave away nearly all my paper
books from my bookshelves at home. The remaining use of paper books is
because either an e-book version isn’t reasonably available or the
book is very graphical, not suited to read/view on a screen (full
image astronomy books, Calvin and Hobbes collections).
As far as formats go, I prefer PDF and ePub, depending on the contents
of the book. Technical books fare better as PDFs due to elaborate
typesetting used for diagrams and code samples. For prose-oriented
content, particularly fiction, ePub is the better format due to its
flexibility and looseness. Twenty-Sided Tale falls in this latter
category. The reader gets to decide the font, size, color, contrast,
and word wrapping. I kept the ePub’s CSS to a bare minimum as to not
get in the reader’s way. Unfortunately I’ve found that most ePub
readers are awful at rendering content, so while technically you could
do the same fancy typesetting with ePub, it rarely works out well.
To start, I spent about 8 hours with Emacs manually converting each
article into Markdown and concatenating them into a single document.
The ePub is generated from the Markdown using the Pandoc
“universal document converter.” The markup includes some HTML, because
Markdown alone, even Pandoc’s flavor, isn’t expressive enough for the
typesetting needs of this particular book. This means it can only
reasonably be transformed into HTML-based formats.
Pandoc isn’t good enough for some kinds of publishing, but it
was sufficient here. The one feature I really wished it had was
support for tagging arbitrary document elements with CSS classes
(images, paragraphs, blockquotes, etc.), effectively extending
Markdown’s syntax. Currently only headings support extra attributes.
Such a feature would have allowed me to bypass all use of HTML, and
the classes could maybe have been re-used in other output formats,
Once I got the book in a comfortable format, I spent another 1.5 weeks
combing through the book fixing up punctuation, spelling, grammar,
and, in some cases, wording. It was my first time editing a book —
fiction in particular — and in many cases I wasn’t sure of the
correct way to punctuate and capitalize some particular expression. Is
“Foreman” capitalized when talking about a particular foreman? What
about “Queen?” How are quoted questions punctuated when the sentence
continues beyond the quotes? As an official source on the matter, I
consulted the Chicago Manual of Style. The first edition is free
online. It’s from 1906, but style really hasn’t changed too
much over the past century!
The original articles were written over a period of three years.
Understandably, Shamus forgot how some of the story’s proper names
were spelled over this time period. There wasn’t a wiki to check. Some
proper names had two, three, or even four different spellings.
Sometimes I picked the most common usage, sometimes the first usage,
and sometimes I had to read the article’s comments written by the
game’s players to see how they spelled their own proper names.
I also sunk time into a stylesheet for a straight HTML version of the
book, with the images embedded within the HTML document itself. This
will be one of the two outputs if you build the book in the
A Process to Improve
Now I’ve got a tidy, standalone e-book version of one of my favorite
online stories. When I want to re-read it again in the future, it will
be as comfortable as reading any other novel.
This has been a wonderful research project into a new domain (for me):
writing and editing, style, and today’s tooling for writing
and editing. As a software developer, the latter overlaps my expertise
and is particularly fascinating. A note to entrepreneurs: There’s
massive room for improvement in this area. Compared software
development, the processes in place today for professional writing and
editing is, by my estimates, about 20 years behind. It’s a place where
Microsoft Word is still the industry standard. Few authors and editors
are using source control or leveraging the powerful tools available
for creating and manipulating their writing.
Unfortunately it’s not so much a technical problem as it is a
social/educational one. The tools mostly exist in one form or another,
but they’re not being put to use. Even if an author or editor learns
or builds a more powerful set of tools, they must still interoperate
with people who do not. Looking at it optimistically, this is a
potential door into the industry for myself: a computer whiz editor
who doesn’t require Word-formatted manuscripts; who can make the
computer reliably and quickly perform the tedious work. Or maybe that
idea only works in fiction.