Articles tagged rant at null program

Everything I've learned so far about running local LLMs

2024-11-10T05:05:20Z

This article was discussed on Hacker News.

Over the past month I’ve been exploring the rapidly evolving world of Large Language Models (LLM). It’s now accessible enough to run a LLM on a Raspberry Pi smarter than the original ChatGPT (November 2022). A modest desktop or laptop supports even smarter AI. It’s also private, offline, unlimited, and registration-free. The technology is improving at breakneck speed, and information is outdated in a matter of months. This article snapshots my practical, hands-on knowledge and experiences — information I wish I had when starting. Keep in mind that I’m a LLM layman, I have no novel insights to share, and it’s likely I’ve misunderstood certain aspects. In a year this article will mostly be a historical footnote, which is simultaneously exciting and scary.

In case you’ve been living under a rock — as an under-the-rock inhabitant myself, welcome! — LLMs are neural networks that underwent a breakthrough in 2022 when trained for conversational “chat.” Through it, users converse with a wickedly creative artificial intelligence indistinguishable from a human, which smashes the Turing test and can be wickedly creative. Interacting with one for the first time is unsettling, a feeling which will last for days. When you bought your most recent home computer, you probably did not expect to have a meaningful conversation with it.

I’ve found this experience reminiscent of the desktop computing revolution of the 1990s, where your newly purchased computer seemed obsolete by the time you got it home from the store. There are new developments each week, and as a rule I ignore almost any information more than a year old. The best way to keep up has been r/LocalLLaMa. Everything is hyped to the stratosphere, so take claims with a grain of salt.

I’m wary of vendor lock-in, having experienced the rug pulled out from under me by services shutting down, changing, or otherwise dropping my use case. I want the option to continue, even if it means changing providers. So for a couple of years I’d ignored LLMs. The “closed” models, accessibly only as a service, have the classic lock-in problem, including silent degradation. That changed when I learned I can run models close to the state-of-the-art on my own hardware — the exact opposite of vendor lock-in.

This article is about running LLMs, not fine-tuning, and definitely not training. It’s also only about text, and not vision, voice, or other “multimodal” capabilities, which aren’t nearly so useful to me personally.

To run a LLM on your own hardware you need software and a model.

The software

I’ve exclusively used the astounding llama.cpp. Other options exist, but for basic CPU inference — that is, generating tokens using a CPU rather than a GPU — llama.cpp requires nothing beyond a C++ toolchain. In particular, no Python fiddling that plagues much of the ecosystem. On Windows it will be a 5MB llama-server.exe with no runtime dependencies. From just two files, EXE and GGUF (model), both designed to load via memory map, you could likely still run the same LLM 25 years from now, in exactly the same way, out-of-the-box on some future Windows OS.

Full disclosure: I’m biased because the official Windows build process is w64devkit. What can I say? These folks have good taste! That being said, you should only do CPU inference if GPU inference is impractical. It works reasonably up to ~10B parameter models on a desktop or laptop, but it’s slower. My primary use case is not built with w64devkit because I’m using CUDA for inference, which requires a MSVC toolchain. Just for fun, I ported llama.cpp to Windows XP and ran a 360M model on a 2008-era laptop. It was magical to load that old laptop with technology that, at the time it was new, would have been worth billions of dollars.

The bottleneck for GPU inference is video RAM, or VRAM. These models are, well, large. The more RAM you have, the larger the model and the longer the context window. Larger models are smarter, and longer contexts let you process more information at once. GPU inference is not worth it below 8GB of VRAM. If “GPU poor”, stick with CPU inference. On the plus side, it’s simpler and easier to get started with CPU inference.

There are many utilities in llama.cpp, but this article is concerned with just one: llama-server is the program you want to run. It’s an HTTP server (default port 8080) with a chat UI at its root, and APIs for use by programs, including other user interfaces. A typical invocation:

$ llama-server --flash-attn --ctx-size 0 --model MODEL.gguf

The context size is the largest number of tokens the LLM can handle at once, input plus output. Contexts typically range from 8K to 128K tokens, and depending on the model’s tokenizer, normal English text is ~1.6 tokens per word as counted by wc -w. If the model supports a large context you may run out of memory. If so, set a smaller context size, like --ctx-size $((1<<13)) (i.e. 8K tokens).

I do not yet understand what flash attention is about, and I don’t know why --flash-attn/-fa is not the default (lower accuracy?), but you should always request it because it reduces memory requirements when active and is well worth the cost.

If the server started successfully, visit it (http://localhost:8080/) to try it out. Though of course you’ll need a model first.

The models

Hugging Face (HF) is “the GitHub of LLMs.” It’s an incredible service that has earned that title. “Small” models are around a few GBs, large models are hundreds of GBs, and HF hosts it all for free. With a few exceptions that do not matter in practice, you don’t even need to sign up to download models! (I’ve been so impressed that after a few days they got a penny-pincher like me to pay for pro account.) That means you can immediately download and try any of the stuff I’m about to discuss.

If you look now, you’ll wonder, “There’s a lot of stuff here, so what the heck am I supposed to download?” That was me one month ago. For llama.cpp, the answer is GGUF. None of the models are natively in GGUF. Instead GGUFs are in a repository with “GGUF” in the name, usually by a third party: one of the heroic, prolific GGUF quantizers.

(Note how nowhere does the official documentation define what “GGUF” stands for. Get used that. This is a technological frontier, and if the information exists at all, it’s not in the obvious place. If you’re considering asking your LLM about this once it’s running: Sweet summer child, we’ll soon talk about why that doesn’t work. As far as I can tell, “GGUF” has no authoritative definition (update: the U stands for “Unified”, but the rest is still ambiguous).)

Since llama.cpp is named after the Meta’s flagship model, their model is a reasonable start, though it’s not my personal favorite. The latest is Llama 3.2, but at the moment only the 1B and 3B models — that is, ~1 billion and ~3 billion parameters — work in Llama.cpp. Those are a little too small to be of much use, and your computer can likely to better if it’s not a Raspberry Pi, even with CPU inference. Llama 3.1 8B is a better option. (If you’ve got at least 24GB of VRAM then maybe you can even do Llama 3.1 70B.)

If you search for Llama 3.1 8B you’ll find two options, one qualified “instruct” and one with no qualifier. Instruct means it was trained to follow instructions, i.e. to chat, and that’s nearly always what you want. The other is the “base” model which can only continue a text. (Technically the instruct model is still just completion, but we’ll get to that later.) It would be great if base models were qualified “Base” but, for dumb path dependency reasons, they’re usually not.

You will not find GGUF in the “Files” for the instruct model, nor can you download the model without signing up in order to agree to the community license. Go back to the search, add GGUF, and look for the matching GGUF model: bartowski/Meta-Llama-3.1-8B-Instruct-GGUF. bartowski is one of the prolific and well-regarded GGUF quantizers. Not only will this be in the right format for llama.cpp, you won’t need to sign up.

In “Files” you will now see many GGUFs. These are different quantizations of the same model. The original model has bfloat16 tensors, but for merely running the model we can throw away most of that precision with minimal damage. It will be a tiny bit dumber and less knowledgeable, but will require substantially fewer resources. The general recommendation, which fits my experience, is to use Q4_K_M, a 4-bit quantization. In general, better to run a 4-bit quant of a larger model than an 8-bit quant of a smaller model. Once you’ve got the basics understood, experiment with different quants and see what you like!

My favorite models

Models are trained for different trade-offs and differ in strengths and weaknesses, so no model is best at everything — especially on “GPU-poor” configurations. My desktop system has an RTX 3050 Ti with 8GB VRAM, and its limitations have shaped my choices. I can comfortably run ~10B models, and ~30B models just barely enough to test their capabilities. For ~70B I rely on third-party hosts. My “t/s” numbers are all on this system running 4-bit quants.

This list omits “instruct” from the model name, but assume the instruct model unless I say otherwise. A few are bona fide open source, at least as far as LLMs practically can be, and I’ve noted the license when that’s the case. The rest place restrictions on both use and distribution.

Mistral-Nemo-2407 (12B) [Apache 2.0]

A collaboration between Mistral AI and Nvidia (“Nemo”), the most well-rounded ~10B model I’ve used, and my default. Inference starts at a comfortable 30 t/s. It’s strengths are writing and proofreading, and it can review code nearly as well as ~70B models. It was trained for a context length of 128K, but its effective context length is closer to 16K — a limitation I’ve personally observed.

The “2407” is a date (July 2024) as version number, a versioning scheme I wholeheartedly support. A date tells you about its knowledge cut-off and tech level. It sorts well. Otherwise LLM versioning is a mess. Just as open source is bad with naming, AI companies do not comprehend versioning.
Qwen2.5-14B [Apache 2.0]

Qwen models, by Alibaba Cloud, impressively punch above their weight at all sizes. 14B inference starts at 11 t/s, with capabilities on par with Mistral Nemo. If I could run 72B on my own hardware, it would probably be my default. I’ve been trying it through Hugging Face’s inference API. There’s a 32B model, but it’s impractical for my hardware, so I haven’t spent much time with it.
Gemma-2-2B

Google’s model is popular, perhaps due to its playful demeanor. For me, the 2B model is great for fast translation. It’s amazing that LLMs have nearly obsoleted Google Translate, and you can run it on your home computer. Though it’s more resource-intensive, and refuses to translate texts it finds offensive, which sounds like a plot element from a sci-fi story. In my translation script, I send it text marked up with HTML. Simply asking Gemma to preserve the markup Just Works! The 9B model is even better, but slower, and I’d use it instead of 2B for translating my own messages into another language.
Phi3.5-Mini (4B) [MIT]

Microsoft’s niche is training on synthetic data. The result is a model that does well in tests, but doesn’t work so well in practice. For me, its strength is document evaluation. I’ve loaded the context with up to 40K-token documents — it helps that it’s a 4B model — and successfully queried accurate summaries and data listings.
SmolLM2-360M [Apache 2.0]

Hugging Face doesn’t just host models; their 360M model is unusually good for its size. It fits on my 2008-era, 1G RAM, Celeron, and 32-bit operating system laptop. It also runs well on older Raspberry Pis. It’s creative, fast, converses competently, can write poetry, and a fun toy in cramped spaces.
Mixtral-8x7B (48B) [Apache 2.0]

Another Mistral AI model, and more of a runner up. 48B seems too large, but this is a Mixture of Experts (MoE) model. Inference uses only 13B parameters at a time. It’s reasonably-suited to CPU inference on a machine with at least 32G of RAM. The model retains more of its training inputs, more like a database, but for reasons we’ll see soon, it isn’t as useful as it might seem.
Llama-3.1-70B and Llama-3.1-Nemotron-70B

More models I cannot run myself, but which I access remotely. The latter bears “Nemo” because it’s an Nvidia fine-tune. If I could run 70B models myself, Nemotron might just be my default. I’d need to spent more time evaluating it against Qwen2.5-72B.

Most of these models have abliterated or “uncensored” versions, in which refusal is partially fine-tuned out at a cost of model degradation. Refusals are annoying — such as Gemma refusing to translate texts it dislikes — but doesn’t happen enough for me to make that trade-off. Maybe I’m just boring. Also refusals seem to decrease with larger contexts, as though “in for a penny, in for a pound.”

The next group are “coder” models trained for programming. In particular, they have fill-in-the-middle (FIM) training for generating code inside an existing program. I’ll discuss what that entails in a moment. As far as I can tell, they’re no better at code review nor other instruct-oriented tasks. It’s the opposite: FIM training is done in the base model, with instruct training applied later on top, so instruct works against FIM! In other words, base model FIM outputs are markedly better, though you lose the ability to converse with them.

There will be a section on evaluation later, but I want to note now that LLMs produce mediocre code, even at the state-of-the-art. The rankings here are relative to other models, not about overall capability.

DeepSeek-Coder-V2-Lite (16B)

A self-titled MoE model from DeepSeek. It uses 2B parameters during inference, making it as fast as Gemma 2 2B but as smart as Mistral Nemo, striking a great balance, especially because it out-competes ~30B models at code generation. If I’m playing around with FIM, this is my default choice.
Qwen2.5-Coder-7B [Apache 2.0]

Qwen Coder is a close second. Output is nearly as good, but slightly slower since it’s not MoE. It’s a better choice than DeepSeek if you’re memory-constrained. While writing this article, Alibaba Cloud released a new Qwen2.5-Coder-7B but failed to increment the version number, which is horribly confusing. The community has taken to calling it Qwen2.5.1. Remember what I said about AI companies and versions? (Update: One day publication, 14B and 32B coder models were released. I tried both, and neither are quite as good as DeepSeek-Coder-V2-Lite, so my rankings are unchanged.)
Granite-8B-Code [Apache 2.0]

IBM’s line of models is named Granite. In general Granite models are disappointing, except that they’re unusually good at FIM. It’s tied in second place with Qwen2.5 7B in my experience.

I also evaluated CodeLlama, CodeGemma, Codestral, and StarCoder. Their FIM outputs were so poor as to be effectively worthless at that task, and I found no reason to use these models. The negative effects of instruct training were most pronounced for CodeLlama.

The user interfaces

I pointed out Llama.cpp’s built-in UI, and I’d used similar UIs with other LLM software. As is typical, no UI is to my liking, especially in matters of productivity, so I built my own, Illume. This command line program converts standard input into an API query, makes the query, and streams the response to standard output. Should be simple enough to integrate into any extensible text editor, but I only needed it for Vim. Vimscript is miserable, probably the second worst programming language I’ve ever touched, so my goal was to write as little as possible.

I created Illume to scratch my own itch, to support my exploration of the LLM ecosystem. I actively break things and add features as needed, and I make no promises about interface stability. You probably don’t want to use it.

Lines that begin with ! are directives interpreted by Illume, chosen because it’s unlikely to appear in normal text. A conversation alternates between !user and !assistant in a buffer.

!user
Write a Haiku about time travelers disguised as frogs.

!assistant
Green, leaping through time,
Frog tongues lick the future's rim,
Disguised in pond's guise.

It’s still a text editor buffer, so I can edit the assistant response, reword my original request, etc. before continuing the conversation. For composing fiction, I can request it to continue some text (which does not require instruct training):

!completion
Din the Wizard stalked the dim castle

I can stop it, make changes, add my own writing, and keep going. I ought to spend more time practicing with it. If you introduce out-of-story note syntax, the LLM will pick up on it, and then you can use notes to guide the LLM’s writing.

While the main target is llama.cpp, I query different APIs, implemented by different LLM software, with incompatibilities across APIs (a parameter required by one API is forbidden by another), so directives must be flexible and powerful. So directives can set arbitrary HTTP and JSON parameters. Illume doesn’t try to abstract the API, but exposes it at a low level, so effective use requires knowing the remote API. For example, the “profile” for talking to llama.cpp looks like this:

!api http://localhost:8080/v1
!:cache_prompt true

Where cache_prompt is a llama.cpp-specific JSON parameter (!:). Prompt cache nearly always better enabled, yet for some reason it’s disabled by default. Other APIs refuse requests with this parameter, so then I must omit or otherwise disable it. The Hugging Face “profile” looks like this:

!api https://api-inference.huggingface.co/models/{model}/v1
!:model Qwen/Qwen2.5-72B-Instruct
!>x-use-cache false

For the sake of HF, Illume can interpolate JSON parameters into the URL. The HF API caches also aggressively caches. I never want this, so I supply an HTTP parameter (!>) to turn it off.

Unique to llama.cpp is an /infill endpoint for FIM. It requires a model with extra metadata, trained a certain way, but this is usually not the case. So while Illume can use /infill, I also added FIM configuration so, after reading the model’s documentation and configuring Illume for that model’s FIM behavior, I can do FIM completion through the normal completion API on any FIM-trained model, even on non-llama.cpp APIs.

Fill-in-the-Middle (FIM) tokens

It’s time to discuss FIM. To get to the bottom of FIM I needed to go to the source of truth, the original FIM paper: Efficient Training of Language Models to Fill in the Middle. This allowed me to understand how these models are FIM-trained, at least enough to put that training to use. Even so, model documentation tends to be thin on FIM because they expect you to run their code.

Ultimately an LLM can only predict the next token. So pick some special tokens that don’t appear in inputs, use them to delimit a prefix and suffix, and middle (PSM) — or sometimes ordered suffix-prefix-middle (SPM) — in a large training corpus. Later in inference we can use those tokens to provide a prefix, suffix, and let it “predict” the middle. Crazy, but this actually works!

{prefix}{suffix}

For example when filling the parentheses of dist = sqrt(x*x + y*y):

dist = sqrt()x*x + y*y

To have the LLM fill in the parentheses, we’d stop at and let the LLM predict from there. Note how is essentially the cursor. By the way, this is basically how instruct training works, but instead of prefix and suffix, special tokens delimit instructions and conversation.

Some LLM folks interpret the paper quite literally and use

, etc.
for their FIM tokens, although these look nothing like their other special
tokens. More thoughtful trainers picked <|fim_prefix|>, etc. Illume
accepts FIM templates, and I wrote templates for the popular models. For
example, here’s Qwen (PSM):

<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>


Mistral AI prefers square brackets, SPM, and no “middle” token:

[SUFFIX]{suffix}[PREFIX]{prefix}


With these templates I could access the FIM training in models unsupported
by llama.cpp’s /infill API.

Besides just failing the prompt, the biggest problem I’ve had with FIM is
LLMs not know when to stop. For example, if I ask it to fill out this
function (i.e. assign something r):

def norm(x: float, y: float) -> float):
    return r


(Side note: Static types, including the hints here, produce better results
from LLMs, acting as guardrails.) It’s not unusual to get something like:

def norm(x: float, y: float) -> float):
    r = sqrt(x*x + y*y)
    return r

def norm3(x: float, y: float, z: float) -> float):
    r = sqrt(x*x + y*y + z*z)
    return r

def norm4(x: float, y: float, z: float, w: float) -> float):
    r = sqrt(x*x + y*y + z*z + w*w)
    return r


Where the original return r became the return for norm4. Technically
it fits the prompt, but it’s obviously not what I want. So be ready to
mash the “stop” button when it gets out of control. The three coder models
I recommended exhibit this behavior less often. It might be more robust to
combine it with a non-LLM system that understands the code semantically
and automatically stops generation when the LLM begins generating tokens
in a higher scope. That would make more coder models viable, but this goes
beyond my own fiddling.

Figuring out FIM and putting it into action revealed to me that FIM is
still in its early stages, and hardly anyone is generating code via FIM. I
guess everyone’s just using plain old completion?

So what are LLMs good for?

LLMs are fun, but what the productive uses do they have? That’s a question
I’ve been trying to answer this past month, and it’s come up shorter than
I hoped. It might be useful to establish boundaries — tasks that LLMs
definitely cannot do.

First, LLMs are no good if correctness cannot be readily verified.
They are untrustworthy hallucinators. Often if you’re in position to
verify LLM output, you didn’t need it in the first place. This is why
Mixtral, with its large “database” of knowledge, isn’t so useful. It also
means it’s reckless and irresponsible to inject LLM output into search
results — just shameful.

LLM enthusiasts, who ought to know better, fall into this trap anyway and
propagate hallucinations. It makes discourse around LLMs less trustworthy
than normal, and I need to approach LLM information with extra skepticism.
Case in point: Recall how “GGUF” doesn’t have an authoritative definition.
Search for one and you’ll find an obvious hallucination that made it all
the way into official IBM documentation. I won’t repeat it hear as to not
make things worse.

Second, LLMs have goldfish-sized working memory. That is, they’re held
back by small context lengths. Some models are trained on larger contexts,
but their effective context length is usually much smaller. In
practice, an LLM can hold several book chapters worth of comprehension “in
its head” at a time. For code it’s 2k or 3k lines (code is token-dense).
That’s the most you can work with at once. Compared to a human, it’s tiny.
There are tools like retrieval-augmented generation and fine-tuning
to mitigate it… slightly.

Third, LLMs are poor programmers. At best they write code at maybe an
undergraduate student level who’s read a lot of documentation. That sounds
better than it is. The typical fresh graduate enters the workforce knowing
practically nothing about software engineering. Day one on the job is the
first day of their real education. In that sense, LLMs today
haven’t even begun their education.

To be fair, that LLMs work as well as they do is amazing! Thrown into the
middle of a program in my unconvential style, LLMs figure it out
and make use of the custom interfaces. (Caveat: My code and writing is in
the training data of most of these LLMs.) So the more context, the better,
within the effective context length. The challenge is getting something
useful out of an LLM in less time than writing it myself.

Writing new code is the easy part. The hard part is maintaining code,
and writing new code with that maintenance in mind. Even when an LLM
produces code that works, there’s no thought to maintenance, nor could
there be. In general the reliability of generate code follows the inverse
square law by length, and generating more than a dozen lines at a time is
fraught. I really tried, but never saw LLM output beyond 2–3 lines of code
which I would consider acceptable.

Quality varies substantially by language. LLMs are better at Python than
C, and better at C than assembly. I suspect it’s related to the difficulty
of the language and the quality of the input. It’s trained on lots of
terrible C — the internet is loaded with it after all — and probably the
only labeled x86 assembly it’s seen is crummy beginner tutorials. Ask it
to use SDL2 and it reliably produces the common mistakes because
it’s been trained to do so.

What about boilerplate? That’s something an LLM could probably do with a
low error rate, and perhaps there’s merit to it. Though the fastest way to
deal with boilerplate is to not write it at all. Change your problem to
not require boilerplate.

Without taking my word for it, consider how it show up in the economics:
If AI companies could deliver the productivity gains they claim, they
wouldn’t sell AI. They’d keep it to themselves and gobble up the software
industry. Or consider the software products produced by companies on the
bleeding edge of AI. It’s still the same old, bloated web garbage everyone
else is building. (My LLM research has involved navigating their awful web
sites, and it’s made be bitter.)

In code generation, hallucinations are less concerning. You already knew
what you wanted when you asked, so you can review it, and your compiler
will help catch problems you miss (e.g. calling a hallucinated method).
However, small context and poor code generation remain roadblocks, and I
haven’t yet made this work effectively.

So then, what can I do with LLMs? A list is apt because LLMs love lists:


  
    Proofreading has been most useful for me. I give it a document such as
an email or this article (~8,000 tokens), tell it to look over grammar,
call out passive voice, and so on, and suggest changes. I accept or
reject its suggestions and move on. Most suggestions will be poor, and
this very article was long enough that even ~70B models suggested
changes to hallucinated sentences. Regardless, there’s signal in the
noise, and it fits within the limitations outlined above. I’m still
trying to apply this technique (“find bugs, please”) to code review, but
so far success is elusive.
  
  
    Writing short fiction. Hallucinations are not a problem; they’re a
feature! Context lengths are the limiting factor, though perhaps you can
stretch it by supplying chapter summaries, also written by LLM. I’m
still exploring this. If you’re feeling lazy, tell it to offer you three
possible story branches at each turn, and you pick the most interesting.
Or even tell it to combine two of them! LLMs are clever and will figure
it out. Some genres work better than others, and concrete works better
than abstract. (I wonder if professional writers judge its writing as
poor as I judge its programming.)
  
  
    Generative fun. Have an argument with Benjamin Franklin (note: this
probably violates the Acceptable Use Policy of some models), hang
out with a character from your favorite book, or generate a new scene of
Falstaff’s blustering antics. Talking to historical figures
has been educational: The character says something unexpected, I look it
up the old-fashioned way to see what it’s about, then learn something
new.
  
  
    Language translation. I’ve been browsing foreign language subreddits
through Gemma-2-2B translation, and it’s been insightful. (I had no idea
German speakers were so distrustful of artificial sweeteners.)
  


Despite the short list of useful applications, this is the most excited
I’ve been about a new technology in years!

An improved chkstk function on Windows

2024-02-05T17:56:05Z

If you’ve spent much time developing with Mingw-w64 you’ve likely seen the symbol ___chkstk_ms, perhaps in an error message. It’s a little piece of runtime provided by GCC via libgcc which ensures enough of the stack is committed for the caller’s stack frame. The “function” uses a custom ABI and is implemented in assembly. So is the subject of this article, a slightly improved implementation soon to be included in w64devkit as libchkstk (-lchkstk).

The MSVC toolchain has an identical (x64) or similar (x86) function named __chkstk. We’ll discuss that as well, and w64devkit will include x86 and x64 implementations, useful when linking with MSVC object files. The new x86 __chkstk in particular is also better than the MSVC definition.

A note on spelling: ___chkstk_ms is spelled with three underscores, and __chkstk is spelled with two. On x86, cdecl functions are decorated with a leading underscore, and so may be rendered, e.g. in error messages, with one fewer underscore. The true name is undecorated, and the raw symbol name is identical on x86 and x64. Further complicating matters, libgcc defines a ___chkstk with three underscores. As far as I can tell, this spelling arose from confusion regarding name decoration, but nobody’s noticed for the past 28 years. libgcc’s x64 ___chkstk is obviously and badly broken, so I’m sure nobody has ever used it anyway, not even by accident thanks to the misspelling. I’ll touch on that below.

When referring to a particular instance, I will use a specific spelling. Otherwise the term “chkstk” refers to the family. If you’d like to skip ahead to the source for libchkstk: libchkstk.S.

A gradually committed stack

The header of a Windows executable lists two stack sizes: a reserve size and an initial commit size. The first is the largest the main thread stack can grow, and the second is the amount committed when the program starts. A program gradually commits stack pages as needed up to the reserve size. Binutils objdump option -p lists the sizes. Typical output for a Mingw-w64 program:

$ objdump -p example.exe | grep SizeOfStack
SizeOfStackReserve      0000000000200000
SizeOfStackCommit       0000000000001000

The values are in hexadecimal, and this indicates 2MiB reserved and 4KiB initially committed. With the Binutils linker, ld, you can set them at link time using --stack. Via gcc, use -Xlinker. For example, to reserve an 8MiB stack and commit half of it:

$ gcc -Xlinker --stack=$((8<<20)),$((4<<20)) ...

MSVC link.exe similarly has /stack.

The purpose of this mechanism is to avoid paying the commit charge for unused stack. It made sense 30 years ago when stacks were a potentially large portion of physical memory. These days it’s a rounding error and silly we’re still dealing with it. Using the above options you can choose to commit the entire stack up front, at which point a chkstk helper is no longer needed (-mno-stack-arg-probe, /Gs2147483647). This requires link-time control of the main module, which isn’t always an option, like when supplying a DLL for someone else to run.

The program grows the stack by touching the singular guard page mapped between the committed and uncommitted portions of the stack. This action triggers a page fault, and the default fault handler commits the guard page and maps a new guard page just below. In other words, the stack grows one page at a time, in order.

In most cases nothing special needs to happen. The guard page mechanism is transparent and in the background. However, if a function stack frame exceeds the page size then there’s a chance that it might leap over the guard page, crashing the program. To prevent this, compilers insert a chkstk call in the function prologue. Before local variable allocation, chkstk walks down the stack — that is, towards lower addresses — nudging the guard page with each step. (As a side effect it provides stack clash protection — the only security aspect of chkstk.) For example:

void callee(char *);

void example(void)
{
    char large[1<<20];
    callee(large);
}

Compiled with 64-bit gcc -O:

example:
    movl    $1048616, %eax
    call    ___chkstk_ms
    subq    %rax, %rsp
    leaq    32(%rsp), %rcx
    call    callee
    addq    $1048616, %rsp
    ret

I used GCC, but this is practically identical to the code generated by MSVC and Clang. Note the call to ___chkstk_ms in the function prologue before allocating the stack frame (subq). Also note that it sets eax. As a volatile register, this would normally accomplish nothing because it’s done just before a function call, but recall that ___chkstk_ms has a custom ABI. That’s the argument to chkstk. Further note that it uses rax on the return. That’s not the value returned by chkstk, but rather that x64 chkstk preserves all registers.

Well, maybe. The official documentation says that registers r10 and r11 are volatile, but that information conflicts with Microsoft’s own implementation. Just in case, I choose a conservative interpretation that all registers are preserved.

Implementing chkstk

In a high level language, chkstk might look something like so:

// NOTE: hypothetical implementation
void ___chkstk_ms(ptrdiff_t frame_size)
{
    volatile char frame[frame_size];  // NOTE: variable-length array
    for (ptrdiff_t i = frame_size - PAGE_SIZE; i >= 0; i -= PAGE_SIZE) {
        frame[i] = 0;  // touch the guard page
    }
}

This wouldn’t work for a number of reasons, but if it did, volatile would serve two purposes. First, forcing the side effect to occur. The second is more subtle: The loop must happen in exactly this order, from high to low. Without volatile, loop iterations would be independent — as there are no dependencies between iterations — and so a compiler could reverse the loop direction.

The store can happen anywhere within the guard page, so it’s not necessary to align frame to the page. Simply touching at least one byte per page is enough. This is essentially the definition of libgcc ___chkstk_ms.

How many iterations occur? In example above, the stack frame will be around 1MiB (2²⁰). With pages of 4KiB (2¹²) that’s 256 iterations. The loop happens unconditionally, meaning every function call requires 256 iterations of this loop. Wouldn’t it be better if the loop ran only as needed, i.e. the first time? MSVC x64 __chkstk skips iterations if possible, and the same goes for my new ___chkstk_ms. Much like the command line string, the low address of the current thread’s guard page is accessible through the Thread Information Block (TIB). A chkstk can cheaply query this address, only looping during initialization or so. (In contrast to Linux, a thread’s stack is fundamentally managed by the operating system.)

Taking that into account, an improved algorithm:

Push registers that will be used
Compute the low address of the new stack frame (F)
Retrieve the low address of the committed stack (C)
Go to 7
Subtract the page size from C
Touch memory at C
If C > F, go to 5
Pop registers to restore them and return

A little unusual for an unconditional forward jump in pseudo-code, but this closely matches my assembly. The loop causes page faults, and it’s the slow, uncommon path. The common, fast path never executes 5–6. I’d also chose smaller instructions in order to keep the function small and reduce instruction cache pressure. My x64 implementation as of this writing:

___chkstk_ms:
    push %rax              // 1.
    push %rcx              // 1.
    neg  %rax              // 2. rax = frame low address
    add  %rsp, %rax        // 2. "
    mov  %gs:(0x10), %rcx  // 3. rcx = stack low address
    jmp  1f                // 4.
0:  sub  $0x1000, %rcx     // 5.
    test %eax, (%rcx)      // 6. page fault (very slow!)
1:  cmp  %rax, %rcx        // 7.
    ja   0b                // 7.
    pop  %rcx              // 8.
    pop  %rax              // 8.
    ret                    // 8.

I’ve labeled each instruction with its corresponding pseudo-code. Step 6 is unusual among chkstk implementations: It’s not a store, but a load, still sufficient to fault the page. That test instruction is just two bytes, and unlike other two-byte options, doesn’t write garbage onto the stack — which would be allowed — nor use an extra register. I searched through single byte instructions that can page fault, all of which involve implicit addressing through rdi or rsi, but they increment rdi or rsi, and would would require another instruction to correct it.

Because of the return address and two push operations, the low stack frame address is technically too low by 24 bytes. That’s fine. If this exhausts the stack, the program is really cutting it close and the stack is too small anyway. I could be more precise — which, as we’ll soon see, is required for x86 __chkstk — but it would cost an extra instruction byte.

On x64, ___chkstk_ms and __chkstk have identical semantics, so name it __chkstk — which I’ve done in libchkstk — and it works with MSVC. The only practical difference between my chkstk and MSVC __chkstk is that mine is smaller: 36 bytes versus 48 bytes. Largest of all, despite lacking the optimization, is libgcc ___chkstk_ms, weighing 50 bytes, or in practice, due to an unfortunate Binutils default of padding sections, 64 bytes.

I’m no assembly guru, and I bet this can be even smaller without hurting the fast path, but this is the best I could come up with at this time.

Update: Stefan Kanthak, who has extensively explored this topic, points out that large stack frame requests might overflow my low frame address calculation at (3), effectively disabling the probe. Such requests might occur from alloca calls or variable-length arrays (VLAs) with untrusted sizes. As far as I’m concerned, such programs are already broken, but it only cost a two-byte instruction to deal with it. I have not changed this article, but the source in w64devkit has been updated.

32-bit chkstk

On x86 ___chkstk_ms has identical semantics to x64. Mine is a copy-paste of my x64 chkstk but with 32-bit registers and an updated TIB lookup. GCC was ahead of the curve on this design.

However, x86 __chkstk is bonkers. It not only commits the stack, but also allocates the stack frame. That is, it returns with a different stack pointer. The return pointer is initially inside the new stack frame, so chkstk must retrieve it and return by other means. It must also precisely compute the low frame address.

__chkstk:
    push %ecx               // 1.
    neg  %eax               // 2.
    lea  8(%esp,%eax), %eax // 2.
    mov  %fs:(0x08), %ecx   // 3.
    jmp  1f                 // 4.
0:  sub  $0x1000, %ecx      // 5.
    test %eax, (%ecx)       // 6. page fault (very slow!)
1:  cmp  %eax, %ecx         // 7.
    ja   0b                 // 7.
    pop  %ecx               // 8.
    xchg %eax, %esp         // ?. allocate frame
    jmp  *(%eax)            // 8. return

The main differences are:

eax is treated as volatile, so it is not saved
The low frame address is precisely computed with lea (2)
The frame is allocated at step (?) by swapping F and the stack pointer
Post-swap F now points at the return address, so jump through it

MSVC x86 __chkstk does not query the TIB (3), and so unconditionally runs the loop. So there’s an advantage to my implementation besides size.

libgcc x86 ___chkstk has this behavior, and so it’s also a suitable __chkstk aside from the misspelling. Strangely, libgcc x64 ___chkstk also allocates the stack frame, which is never how chkstk was supposed to work on x64. I can only conclude it’s never been used.

Optimization in practice

Does the skip-the-loop optimization matter in practice? Consider a function using a large-ish, stack-allocated array, perhaps to process environment variables or long paths, each of which max out around 64KiB.

_Bool path_contains(wchar_t *name, wchar *path)
{
    wchar_t var[1<<15];
    GetEnvironmentVariableW(name, var, countof(var));
    // ... search for path in var ...
}

int64_t getfilesize(char *path)
{
    wchar_t wide[1<<15];
    MultiByteToWideChar(CP_UTF8, 0, path, -1, wide, countof(wide));
    // ... look up file size via wide path ...
}

void example(void)
{
    if (path_contains(L"PATH", L"c:\\windows\\system32")) {
        // ...
    }

    int64_t size = getfilesize("π.txt");
    // ...
}

Each call to these functions with such large local arrays is also a call to chkstk. Though with a 64KiB frame, that’s only 16 iterations; barely detectable in a benchmark. If the function touches the file system, which is likely when processing paths, then chkstk doesn’t matter at all. My starting example had a 1MiB array, or 256 chkstk iterations. That starts to become measurable, though it’s also pushing the limits. At that point you ought to be using a scratch arena.

So ultimately after writing an improved ___chkstk_ms I could only measure a tiny difference in contrived programs, and none in any real application. Though there’s still one more benefit I haven’t yet mentioned…

“The first thing we do, let’s kill all the lawyers”.

My original motivation for this project wasn’t the optimization — which I didn’t even discover until after I had started — but licensing. I hate software licenses, and the tools I’ve written for w64devkit are dedicated to the public domain. Both source and binaries (as distributed). I can do so because I don’t link runtime components, not even libgcc. Not even header files. Every byte of code in those binaries is my work or the work of my collaborators.

Every once in awhile ___chkstk_ms rears its ugly head, and I have to make a decision. Do I re-work my code to avoid it? Do I take the reigns of the linker and disable stack probes? I haven’t necessarily allocated a large local array: A bit of luck with function inlining can combine several smaller stack frames into one that’s just large enough to require chkstk.

Since libgcc falls under the GCC Runtime Library Exception, if it’s linked into my program through an “Eligible Compilation Process” — which I believe includes w64devkit — then the GPL-licensed functions embedded in my binary are legally siloed and the GPL doesn’t infect the rest of the program. These bits are still GPL in isolation, and if someone were to copy them out of the program then they’d be normal GPL code again. In other words, it’s not a 100% public domain binary if libgcc was linked!

(If some FSF lawyer says I’m wrong, then this is an escape hatch through which anyone can scrub the GPL from GCC runtime code, and then ignore the runtime exception entirely.)

MSVC is worse. Hardly anyone follows its license, but fortunately for most the license is practically unenforced. Its chkstk, which currently resides in a loose chkstk.obj, falls into what Microsoft calls “Distributable Code.” Its license requires “external end users to agree to terms that protect the Distributable Code.” In other words, if you compile a program with MSVC, you’re required to have a EULA including the relevant terms from the Visual Studio license. You’re not legally permitted to distribute software in the manner of w64devkit — no installer, just a portable zip distribution — if that software has been built with MSVC. At least not without special care which nobody does. (Don’t worry, I won’t tell.)

How to use libchkstk

To avoid libgcc entirely you need -nostdlib. Otherwise it’s implicitly offered to the linker, and you’d need to manually check if it picked up code from libgcc. If ld complains about a missing chkstk, use -lchkstk to get a definition. If you use -lchkstk when it’s not needed, nothing happens, so it’s safe to always include.

I also recently added a libmemory to w64devkit, providing tiny, public domain definitions of memset, memcpy, memmove, memcmp, and strlen. All compilers fabricate calls to these five functions even if you don’t call them yourself, which is how they were selected. (Not because I like them. I really don’t.). If a -nostdlib build complains about these, too, then add -lmemory.

$ gcc -nostdlib ... -lchkstk -lmemory

In MSVC the equivalent option is /nodefaultlib, after which you may see missing chkstk errors, and perhaps more. libchkstk.a is compatible with MSVC, and link.exe doesn’t care that the extension is .a rather than .lib, so supply it at link time. Same goes for libmemory.a if you need any of those, too.

$ cl ... /link /nodefaultlib libchkstk.a libmemory.a

While I despise licenses, I still take them seriously in the software I distribute. With libchkstk I have another tool to get it under control.

Big thanks to Felipe Garcia for reviewing and correcting mistakes in this article before it was published!

My ranking of every Shakespeare play

2023-06-22T19:10:25Z

This article was discussed on Hacker News.

A few years ago I set out on a personal journey to study and watch a performance of each of Shakespeare’s 37 plays. I’ve reached my goal and, though it’s not a usual topic around here, I wanted to get my thoughts down while fresh. I absolutely loved some of these plays and performances, and so I’d like to highlight them, especially because my favorites are, with one exception, not “popular” plays. Per tradition, I begin with my least enjoyed plays and work my way up. All performances were either a recording of a live stage or an adaptation, so they’re also available to you if you’re interested, though in most cases not for free. I’ll mention notable performances when applicable. The availability of a great performance certainly influenced my play rankings.

Like many of you, I had assigned reading for several Shakespeare plays in high school. I loathed these assignments. I wasn’t interested at the time, nor was I mature enough to appreciate the writing. Even revisiting as an adult, the conventional selection — Romeo and Juliet, Julius Caesar, etc. — are not highly ranked on my list. For the next couple of decades I thought that Shakespeare just wasn’t for me.

Then I watched the 1993 adaption of Much Ado About Nothing and it instantly became one of my favorite films. Why didn’t we read this in high school?! Reading the play with footnotes helped to follow the humor and allusions. Even with the film’s abridging, some of it still went over my head. I soon discovered Asimov’s Guide to Shakespeare — yes, that Asimov — which was exactly what I needed, and a perfect companion while reading and watching the plays. If stumbling upon this turned out so well, then I’d better keep going.

Wanting a solid set of the plays with good footnotes and editing — there is no canonical version of the plays — I picked up a copy of The Norton Shakespeare. Unfortunately it’s part of the college textbook racket, and it shows. The collection is designed to be sold to students who will lug them in bookbags, will typically open them face-up on a desk, and are uninterested in their contents beyond class. It includes a short-term, digital-only, DRMed component to prevent resale. After all, their target audience will not read it again anyway. Though at least it’s complete and compact, better for reference than reading.

In contrast, the Folger Shakespeare Library mass market paperbacks are better for enthusiasts, both in form and format. They’re clearly built for casual, comfortable reading. However, they’re not sold as a complete set, and gathering used copies takes some work.

Also essential was BBC Television Shakespeare, produced between 1978 and 1985. Finding productions of the more obscure plays is tricky, but it always provided a fallback. In some cases these were the best performances anyway! When I mention “the BBC production” I mean this series. Like many collections, they omit The Two Noble Kinsmen due to unclear authorship, and for this reason I’m omitting it from my list as well. As with any faithful production, I suggest subtitles on the first viewing, as it aids with understanding. Shakespeare’s sentence structure is sometimes difficult to parse by moderns, and on-screen text helps. (By the way, a couple of handy SHA-1 sums for those who know how to use them:)

0ae909e5444c17183570407bd09a622d2827751e
55c77ed7afb8d377c9626527cc762bda7f3e1d83

As my list will show, my favorites are comedic comedies and histories, particularly the two Henriads, each a group of four plays. The first — Richard II, 1 Henry IV, 2 Henry IV, and Henry V — concerns events around Henry V, in the late 14th and early 15th century. Those number prefixes are parts, as in Henry IV has two parts. In my list I combine parts as though a single play. The second — 1 Henry VI, 2 Henry VI, 3 Henry VI, Richard III — is about the Wars of the Roses, spanning the 15th century. Asimov’s book was essential for filling in the substantial historical background for these plays, and my journey was also in part a history study.

I especially enjoy villain monologues, and plays with them rank higher as a result. It’s said that everyone is the hero of their own story, but Shakespeare’s villains may know that they’re villains and revel it in it, bragging directly to the audience about all the trouble they’re going to cause. In some cases they mock the audience’s sacred values, which in a way, is like the stand up comedy of Shakespeare’s time. Notable examples are Edmund (King Lear), Aaron (Titus Andronicus), Richard III, Iago (Othello), and Shylock (The Merchant of Venice).

As with literature even today, authors are not experts in moral reasoning and protagonists are often, on reflection, incredibly evil. Shakespeare is no different, especially for historical events and people, praising those who create mass misery (e.g. tyrants waging wars) and vilifying those who improve everyone’s lives (e.g. anyone who deals with money). Up to and including Shakespeare’s time, a pre-industrial army on the march was a rolling humanitarian crisis, even in “friendly” territory, slaughtering and stealing its way through the country in order to keep going. So, much like suspension of belief, there’s a suspension of morality where I engage with the material on its own moral terms, however illogical it may be.

Now finally my list. The beginning will be short and negative because, to be frank, I disliked some of the plays. Even Shakespeare had to work under constraints. In his time none were regarded as great works. They weren’t even viewed as literature, but similarly to how we consider television scripts today. Also, around 20% of plays credited to Shakespeare were collaborations of some degree, though the collaboration details have been long lost. For simplicity, I will just refer to the author as Shakespeare.

(37) Timon of Athens

I have nothing positive to say about this play. It’s about a man who borrows and spends recklessly, then learns all the wrong lessons from the predictable results.

(36) The Two Gentlemen of Verona

Involves a couple of love triangles, a woman disguised as a man — a common Shakespeare trope — and perhaps the worst ending to a play ever written. The two “gentlemen” are terrible people and undeserving of their happy ending. Though I enjoyed the scenes with Proteus and Crab, the play’s fool and his dog.

(35) Troilus and Cressida

Interesting that it’s set during the Iliad and features legendary characters such as Achilles, Ajax, and Hector. I have no other positives to note. Cressida’s abrupt change of character in the Greek camp later in the play is baffling, as though part of the play has been lost, and ruins an already dull play for me.

(34) The Winter’s Tale

A baby princess is lost, presumed dead, and raised by shepherds. She is later rediscovered by her father as a young adult. It has a promising start, but in the final act the main plot is hastily resolved off-stage and seemingly replaced with a hastily rewritten ending that nonsensically resolves a secondary story line.

(33) Cymbeline

The title refers to a legendary early King of Britain and is set in the first century, but it is primarily about his daughter. The plot is complicated so I won’t summarize it here. It’s long and I just didn’t enjoy it. This is the second play in the list to feature a woman disguised as a man.

(32) The Tempest

A political exile stranded on an island in the Mediterranean gains magical powers through study, with the help of a spirit creates a tempest that strands his enemies on his island, then gently torments them until he’s satisfied that he’s had his revenge. It’s an okay play.

More interesting is the historical context behind the play. It’s based loosely on events around the founding of Jamestown, Virginia. Until this play, Shakespeare and Jamestown were, in my mind, unrelated historical events. In fact, Pocahontas very nearly met Shakespeare, missing him by just a couple of years, but she did meet his rival, Ben Jonson. I spent far more time catching up on real history, including reading the fascinating True Reportory, than I did on the play.

(31) The Taming of the Shrew

About a man courting and “taming” an ill-tempered woman, the shrew. The seeming moral of the play was outdated even in Shakespeare’s time, and it’s unclear what was intended. Technically it’s a play within a play, and an outer frame presents the play as part of an elaborate prank. However, the outer frame is dropped and never revisited, indicating that perhaps this part of the play was lost. The BBC production skips this framing entirely and plays it straight.

(30) All’s Well That Ends Well

Helena, a low-born enterprising young woman, saves a king’s life. She’s in love with a nobleman, Bertram, and the king orders him to marry her as repayment. He spurns her solely due to her low upbringing and flees the country. She gives chase, and eventually wins him over. Helena is a great character, and Bertram is utterly undeserving of her, which ruins the play for me in an unearned ending.

(29) Antony and Cleopatra

A tragedy about people who we know for sure existed, the first such on the list so far. The sequel to Julius Caesar, completing the story of the Second Triumvirate. Historically interesting, but the title characters were terrible, selfish people, including in the play, and they aren’t interesting enough to make up for it.

I enjoyed the portrayal of Octavian as a shrewd politician.

(28) Julius Caesar

A classic school reading assignment. Caesar’s death in front of the Statue of Pompey is obviously poetic, and so every performance loves playing it up. Antony’s speech is my favorite part of the play. I didn’t dislike this play, but nor did I find it interesting revisiting it as an adult.

(27) Coriolanus

About the career of a legendary Roman general and war hero who attempts to enter politics. He despises the plebeians, which gets him into trouble, but all he really wants is to please is mother. Stratford Festival has a worthy adaption in a contemporary setting.

(26) Henry VIII

He reigned from 1509 to 1547, but the play only covers Henry VIII’s first divorce. It paved the way for the English Reformation, though the play has surprisingly little to say it, or his murder spree. It’s set a few decades after the events of Richard III — too distant to truly connect with the second Henriad.

While I appreciate its historical context — with liberal dramatic license — it’s my least favorite of the English histories. It’s not part of an epic tetralogy, and the subject matter is mundane. My favorite scene is Katherine (Catherine in the history books) firmly rejecting the court’s jurisdiction and walking out. My favorite line: “No man’s pie is freed from his ambitious finger.”

(25) Romeo and Juliet

Another classic reading assignment that requires no description. A beautiful play, but I just don’t connect with its romantic core.

(24) The Merchant of Venice

An infamously antisemitic play where a Jewish moneylender, Shylock, loans to the titular merchant of Venice where the collateral is the original “pound of flesh,” providing the source for that cliche. Though even in his prejudice, Shakespeare can’t help but write multifaceted characters, particularly with Shylock’s famous “If you prick us, do we not bleed?” speech.

(23) Twelfth Night

Twins, a young man and a woman, are separated by a shipwreck. The woman disguises herself as a man and takes employment with a local duke and falls in love with him, but her employment requires her to carry love letters to the duke’s love interest. In the meantime the brother arrives, unaware his sister is in town in disguise, and everyone gets the twins mixed up leading to comedy. It’s a fun play. The title has nothing to do with the play, but refers to the holiday when the play was first performed.

The play is the source of the famous quote, “Some are born great, some achieve greatness, and some have greatness thrust upon them.” It’s used as part of a joke, and when I heard it, I had thought the play was mocking some original source.

(22) Pericles

A Greek play about a royal family — father, mother, daughter — separated by unfortunate — if contrived — circumstances, each thinking the others dead, but all tearfully reunited in a happy ending. My favorite part is the daughter, Marina, talking her way out of trouble: “She’s able to freeze the god Priapus and undo a whole generation.”

The BBC production stirred me, particularly the scene where Pericles and Marina are reunited.

(21) Richard II

Richard II, grandson of the famed Edward III, was a young King of England from 1367 to 1400. At least in the play, he carelessly makes dangerous enemies of his friends, and so is deposed by Henry Bolingbroke, who goes on to become Henry IV. The play is primarily about this abrupt transition of power, and it is the first play of the first Henriad. The conflict in this play creates tensions that will not be resolved until 1485, the end of the Wars of the Roses. Shakespeare spends seven additional plays on this a huge, interesting subject.

For me, Richard II is the most dull of the Henriad plays. It’s a slow start, but establishes the groundwork for the greater plays that follow. The BBC production of the first Henriad has “linked” casting where the same actors play the same roles through the four plays, which makes this an even more important watch.

(20) Othello

Another of the famous tragedy. Othello, an important Venetian general, and “the Moore of Venice” is dispatched to Venice-controlled Cyprus to defend against an attack by the Ottoman Turks. Iago, who has been overlooked for promotion by Othello, treacherously seeks revenge, secretly sabotaging all involved while they call him “honest Iago.” Though his schemes quickly go well beyond revenge, and continues sowing chaos just for his own fun.

I watched a few adaptions, and I most enjoyed the 2015 Royal Shakespeare Company Othello, which places it in a modern setting and requires few changes to do so.

(19) The Comedy of Errors

A fun, short play about a highly contrived situation: Two pairs of twins, where each pair of brothers has been given the same name, is separated at birth. As adults they all end up in the same town, and everyone mixes them up leading to comedy. It’s the lightest of Shakespeare’s plays, but also lacks depth.

(18) Hamlet

Another common, more senior, high school reading assignment. Shakespeare’s longest play, and probably the most subtle. In everything spoken between Hamlet and his murderous uncle, Claudius, one must read between the lines. Their real meanings are obscured by courtly language — familiar to Shakespeare’s audience, but not moderns. Asimov is great for understanding the political maneuvering, which is a lot like a game of chess. It made me appreciate the play more than I would have otherwise.

You’d be hard-pressed to find something that beats the faithful, star-studded 1996 major film adaption.

(17) Richard III

The final play of the second Henriad. Much of the play is Richard III winking at the audience, monologuing about his villainous plans, then executing those plans without remorse. Makes cheering for the bad guy fun. If you want to see an evil schemer get away with it, at least right up until the end when he gets his comeuppance, this is the play for you. This play is the source of the famous “My kingdom for a horse.”

I liked two different performances for different reasons. The 1995 major film puts the play in the World Word II era. It’s solid and does well standing alone. The BBC production has linked casting with the three parts of Henry VI, which allows one to enjoy it in full in its broader context. It’s also well-performed, but obviously has less spectacle and a lower budget.

(16) The Merry Wives of Windsor

The comedy spin-off of Henry IV. Allegedly, Elizabeth I liked the character of John Falstaff from Henry IV so much — I can’t blame her! — that she demanded another play with the character, and so Shakespeare wrote this play. The play brings over several characters from Henry IV. Unfortunately it’s in name only and they hardly behave like the same characters. Despite this, it’s still fun and does not require knowledge of Henry IV.

Falstaff ineptly attempts to seduce two married women, the titular wives, who play along in order to get revenge on him. However, their husbands are not in on the prank. One suspects infidelity and hatches his own plans. The confusion leads to the comedy.

The 2018 Royal Shakespeare Company production aptly puts it in a modern suburban setting.

(15) Titus Andronicus

A play about a legendary Roman general committed to duty above all else, even the lives of his own sons. He and his family become brutal victims of political rivals, and in return gets his own brutal revenge. It’s by far Shakespeare’s most violent and disturbing play. It’s a bit too violent even for me, but it ranks this highly because Aaron the Moore is such a fantastic character, another villain that loves winking at the audience. His lines throughout the play make me smile: “If one good deed in all my life I did, I do repent it from my very soul.”

I enjoyed the 1999 major film, which puts it in a contemporary setting.

(14) King Lear

The titular, mythological king of pre-Roman Britain wants to retire, and so he divides his kingdom between his three daughters. However, after petty selfishness on Lear’s part, he disowns the most deserving daughter, while the other two scheme against one another.

Some of the scenes in this play are my favorite among Shakespeare, such as Edmund’s monologue on bastards where he criticizes the status quo and mocks the audience’s beliefs. It also has one of the best fools, who while playing dumb, is both observant and wise. That’s most of Shakespeare’s fools, but it’s especially true in King Lear (“This is not altogether fool, my lord.”). This fool uses this “tenure” to openly mock the king to his face, the only character that can do so without repercussions.

My favorite performance was the 2015 Stratford Festival stage production, especially for its Edmund, Lear, and Fool.

(13) Macbeth

The shortest tragedy, a common reading assignment, and a perfect example of literature I could not appreciate without more maturity. Even the plays I dislike have beautiful poetry, but I especially love it in Macbeth.

The history behind Macbeth is itself fascinating. The play was written custom for the newly-crowned King James I — of King James Version fame — and even calls him out in the audience. James I was obsessed with witch hunts, so the play includes witchcraft. The character Banquo was by tradition considered to be his ancestor.

My favorite production by far — I watched a number of them! — was the 2021 film. It should be an approachable introduction for Shakespeare newcomers more interested in drama than comedy. Notably for me, it departs from typical productions in that Macbeth and Lady Macbeth do not scream at each other — perhaps normally a side effect of speaking loudly for stage performance. Particularly in Act 1, Scene 7 (“screw your courage to the sticking place”). In the film they argue calmly, like a couple in a genuine, healthy relationship, making the tragedy that much more tragic.

That being said, it drops the ball with the porter scene — a bit of comic relief just after Macbeth murders Duncan. There’s knocking at the gate, and the porter, charged with attending it, is hungover and takes his time. In a monologue he imagines himself porter to Hell, and on each impatient knock considers the different souls he would be greeting. Of all the porter scenes I watched, the best porter as the 2017 Stratford Festival production, where he is both charismatic and hilarious. I wish I could share a clip.

(12) King John

King John, brother of “Coeur de Lion” Richard I, ruled in early 13th century. His reign led to the Magna Carta, and he’s also the Prince John of the Robin Hood legend, though because it’s a history, and paints John in a positive light, that legend isn’t included. It depicts fascinating, real historical events and people, including Eleanor of Aquitaine. It also has one of my favorite Shakespeare characters, Phillip the Bastard, who gets all the coolest lines. I especially love his introductory scene where his lineage is disputed by his half-bother and Eleanor, impressed, essentially adopts him on the spot.

The 2015 Stratford Festival stage performance is wonderful, and I’ve re-watched it a few times. The performances are all great.

(11–9) Henry VI

As previously noted, this is actually three plays. At 3–4 hours apiece, it’s about the length of a modern television season. I thought it might take awhile to consume, but I was completely sucked in, watching and studying the whole trilogy in a single weekend.

Henry V died young in 1422, and his infant son became Henry VI, leaving England ruled by his uncles. As an adult he was a weak king, which allowed the conflicts of the previously-mentioned Richard II to bubble up into the Wars of the Roses, a bloody power conflict between the Lancasters and Yorks. The play features historical people including Joan la Pucelle (“Joan of Arc”), English war hero John Talbot, and Jack Cade. Richard III wraps up the conflicts of Henry VI, forming the second Henriad. When watching/reading the play, keep in mind that the play is anti-French, anti-York, and (implicitly) pro-Tudor.

Most of the first part was probably not written by Shakespeare, but rather adapted from an existing play to fill out the backstory. I think I can see the “seams” between the original and the edits that introduce the roses.

I loved the BBC production of the second Henriad. Producing such an epic story must be daunting, and it’s amazing what they could convey with such limited budget and means. It has hilarious and clever cinematography for the scene where the Countess of Auvergne attempts to trap Talbot (Part 1, Act 2, Scene 3). Again, I wish I could share a clip!

(8) Henry V

Due to his amazing victories, most notably at Agincourt where, for once, Shakespeare isn’t exaggerating the odds, Henry V is one of the great kings of English history. This play is a followup to Richard II and Henry IV, completing the first Henriad, and depicts Henry V’s war with France. Outside of the classroom, this is one of Shakespeare’s most popular plays.

The obvious choice for viewing is the 1989 major film, which, by borrowing a few scenes from Henry IV, attempts a standalone experience, though with limited success. I watched it before Henry IV, and I could not understand why the film was so sentimental about a character that hadn’t even appeared yet. It probably has the best Saint Crispin’s Day Speech ever performed, in part because it’s placed in a broader context than originally intended. The introduction is bold as is Exeter’s ultimatum delivery. It cleverly, and without changing his lines, also depicts Montjoy, the French messenger, as sympathetic to the English, also not originally intended. I didn’t realize this until I watched other productions.

The BBC production is also worthy, in large part because of its linked casting with Richard II and Henry IV. It’s also unabridged, including the whole glove thing, for better or worse.

(7–6) Henry IV

People will think I’m crazy, but yes, I’m placing Henry IV above Henry V. My reason is just two words: John Falstaff. This character is one of Shakespeare’s greatest creations, and really makes these plays for me. As previously noted, this is two plays mainly because John Falstaff was such a huge hit. The sequel mostly retreads the same ground, but that’s fine! I’ve read and re-read all the Falstaff scenes because they’re so fun. I now have a habit of quoting Falstaff, and it drives my wife nuts.

The Falstaff role makes or breaks a Henry IV production, and my love for this play is in large part thanks to the phenomenal BBC production. It has a warm, charismatic Falstaff that perfectly nails the role. It’s great even beyond Falstaff, of course. At the end of part 2, I tear up seeing Henry V test the chief justice. I adore this production. What a masterpiece.

(5) A Midsummer Night’s Dream

A popular, fun, frivolous play that I enjoyed even more than I expected, where faeries interfere with Athenians who wander into their forest. The “rude mechanicals” are charming, especially the naive earnestness of Nick Bottom, making them my favorite part of the play.

My enjoyment is largely thanks to a 2014 stage production with great performances all around, great cinematography, and incredible effects. Highly recommended. Honorable mention goes to the great Nick Bottom performances of the BBC production and the 1999 major film.

(4) As You Like It

A pastoral comedy about idyllic rural life, and the source of the famous quote “All the world’s a stage.” A duke has deposed his duke brother, exiling him and his followers to the forest where the rest of the play takes place. The main character, Rosalind, is one of the exiles, and, disguised as a man named Ganymede, flees into the forest with her cousin. There she runs into her also-exiled love interest, Orlando. While still disguised as Ganymede, she roleplays as Rosalind — that is, herself — to help him practice wooing herself. Crazy and fun.

A couple of my favorite lines are “There’s no clock in the forest” and “falser than vows made in wine.” It’s an unusually musical play, and has a big, happy ending. The fool, Touchstone, is one of my favorite fools, named such because he tests the character of everyone with whom he comes in contact.

It ranks so highly because of an endearing 2019 production by Kentucky Shakespeare, which sets the story in a 19th century Kentucky. This is the most amateur production I’ve shared so far — literally Shakespeare in the park — but it’s just so enjoyable. Their Rosalind is fantastic and really makes the play work. I’ve listened to just the audio of the play, like a podcast, many times now.

(3) Measure for Measure

A comedy about justice and mercy. The duke of Vienna announces he will be away on a trip to Poland, but secretly poses as a monk in order to get his thumb on the pulse of his city. Unfortunately the man running the city in his stead is corrupt, and the softhearted duke can’t help but pull strings behind the scenes to undo the damage, and more. He sets up a scheme such that, after his dramatic return as duke, the plot is unraveled while simultaneously testing the character of all involved.

I love so many of the characters and elements of this play. I smile when the duke jumps into action, my heart wrenches at Isabella’s impassioned speech for mercy (“it is excellent to have a giant’s strength, but it is tyrannous to use it like a giant”), I admire the provost’s selfless loyalty to the duke, I laugh when Lucio the “fantastic” keeps putting his foot in his mouth, and I cry when Mariana begs Isabella to forgive. All around a wonderful play.

Like so many already, a big part of my love for the play is the BBC production, which is full of great performances, particularly the duke, Isabella, and Lucio.

(2) Much Ado About Nothing

As the play that finally got me interested in Shakespeare, of course it’s near the top of the list. Forget Romeo and Juliet: Benedick and Beatrice are Shakespeare’s greatest romantic pairing!

Don Pedro, Prince of Aragon, stops in Messina with his soldiers while returning from a military action. While in town there’s a matchmaking plot and lots of eavesdropping, and then chaos created by the wicked Don John, brother to Don Pedro. It’s a fun, light, hilarious play. It also features another of Shakespeare’s great comic characters, Dogberry, famous for his malapropisms.

This is a very popular play with tons of productions, though I only watched a few of them. The previously-mentioned 1993 adaption remains my favorite. It does some abridging, but honestly, it makes the play better and improves the comedic beats.

(1) Love’s Labour’s Lost

Finally, my favorite play of all, and an unusual one to be at the top of the list. Much of the play is subtle parody and so makes for a poor first play for newcomers, who would not be familiar enough with Shakespeare’s language to distinguish parody from genuine.

The King of Navarre and three lords swear an oath to seclude themselves in study, swearing off the company of women. Then the French princess and her court arrives, the four men secretly write love letters in violation of their oaths, and comedy ensues. There are also various eccentric side characters mixed into the plot to spice it up. It’s all a ton of fun and ends with an inept play within a play about the “nine worthies.”

The major reason I love this play so much is a literally perfect 2017 production by Stratford Festival. I love every aspect of this production such that I can’t even pick a favorite element. I was hooked within the first minute.

w64devkit: (Almost) Everything You Need

2020-09-25T00:04:11Z

This article was discussed on Hacker News.

This past May I put together my own C and C++ development distribution for Windows called w64devkit. The entire release weighs under 80MB and requires no installation. Unzip and run it in-place anywhere. It’s also entirely offline. It will never automatically update, or even touch the network. In mere seconds any Windows system can become a reliable development machine. (To further increase reliability, disconnect it from the internet.) Despite its simple nature and small packaging, w64devkit is almost everything you need to develop any professional desktop application, from a command line utility to a AAA game.

I don’t mean this in some useless Turing-complete sense, but in a practical, get-stuff-done sense. It’s much more a matter of know-how than of tools or libraries. So then what is this “almost” about?

The distribution does not have WinAPI documentation. It’s notoriously difficult to obtain and, besides, unfriendly to redistribution. It’s essential for interfacing with the operating system and difficult to work without. Even a dead tree reference book would suffice.
Depending on what you’re building, you may still need specialized tools. For instance, game development requires tools for editing art assets.
There is no formal source control system. Git is excluded per the issues noted in the announcement, and my next option, Quilt, has similar limitations. However, diff and patch are included, and are sufficient for a kind of old-school, patch-based source control. I’ve used it successfully when dogfooding w64devkit in a fresh Windows installation.

Everything else

As I said in my announcement, w64devkit includes a powerful text editor that fulfills all text editing needs, from code to documentation. The editor includes a tutorial (vimtutor) and complete, built-in manual (:help) in case you’re not yet familiar with it.

What about navigation? Use the included ctags to generate a tags database (ctags -R), then jump instantly to any definition at any time. No need for that Language Server Protocol rubbish. This does not mean you must laboriously type identifiers as you work. Use built-in completion!

Build system? That’s also covered, via a Windows-aware unix-like environment that includes make. Learning how to use it is a breeze. Software is by its nature unavoidably complicated, so don’t make it more complicated than necessary.

What about debugging? Use the debugger, GDB. Performance problems? Use the profiler, gprof. Inspect compiler output either by asking for it (-S) or via the disassembler (objdump -d). No need to go online for the Godbolt Compiler Explorer, as slick as it is. If the compiler output is insufficient, use SIMD intrinsics. In the worst case there are two different assemblers available. Real time graphics? Use an operating system API like OpenGL, DirectX, or Vulkan.

w64devkit really is nearly everything you need in a single, no nonsense, fully-offline package! It’s difficult to emphasize this point as much as I’d like. When interacting with the broader software ecosystem, I often despair that software development has lost its way. This distribution is my way of carving out an escape from some of the insanity. As a C and C++ toolchain, w64devkit by default produces lean, sane, trivially-distributable, offline-friendly artifacts. All runtime components in the distribution are static link only, so no need to distribute DLLs with your application either.

Customize the distribution, own the toolchain

While most users would likely stick to my published releases, building w64devkit is a two-step process with a single build dependency, Docker. Anyone can easily customize it for their own needs. Don’t care about C++? Toss it to shave 20% off the distribution. Need to tune the runtime for a specific microarchitecture? Tweak the compiler flags.

One of the intended strengths of open source is users can modify software to suit their needs. With w64devkit, you own the toolchain itself. It is one of your dependencies after all. Unfortunately the build initially requires an internet connection even when working from source tarballs, but at least it’s a one-time event.

If you choose to take on dependencies, and you build those dependencies using w64devkit, all the better! You can tweak them to your needs and choose precisely how they’re built. You won’t be relying on the goodwill of internet randos nor the generosity of a free package registry.

Customization examples

Building existing software using w64devkit is probably easier than expected, particularly since much of it has already been “ported” to MinGW and Mingw-w64. Just don’t bother with GNU Autoconf configure scripts. They never work in w64devkit despite having everything they technically need. So other than that, here’s a demonstration of building some popular software.

One of my coworkers uses his own version of PuTTY patched to play more nicely with Emacs. If you wanted to do the same, grab the source tarball, unpack it using the provided tools, then in the unpacked source:

$ make -C windows -f Makefile.mgw

You’ll have a custom-built putty.exe, as well as the other tools. If you have any patches, apply those first!

Would you like to embed an extension language in your application? Lua is a solid choice, in part because it’s such a well-behaved dependency. After unpacking the source tarball:

$ make PLAT=mingw

This produces a complete Lua compiler, runtime, and library. It’s not even necessary to use the Makefile, as it’s nearly as simple as “cc *.c” — painless to integrate or embed into any project.

Do you enjoy NetHack? Perhaps you’d like to try a few of the custom patches. This one is a little more complicated, but I was able to build NetHack 3.6.6 like so:

$ sys/winnt/nhsetup.bat
$ make -C src -f Makefile.gcc cc="cc -fcommon" link="cc"

NetHack has a bug necessitating -fcommon. If you have any patches, apply them with patch before the last step. I won’t belabor it here, but with just a little more effort I was also able to produce a NetHack binary with curses support via PDCurses — statically-linked of course.

How about my archive encryption tool, Enchive? The one that even works with 16-bit DOS compilers. It requires nothing special at all!

$ make

w64devkit can also host parts of itself: Universal Ctags, Vim, and NASM. This means you can modify and recompile these tools without going through the Docker build. Sadly busybox-w32 cannot host itself, though it’s close. I’d love if w64devkit could fully host itself, and so Docker — and therefore an internet connection and such — would only be needed to bootstrap, but unfortunately that’s not realistic given the state of the GNU components.

Offline and reliable

Software development has increasingly become dependent on a constant internet connection. Robust, offline tooling and development is undervalued.

Consider: Does your current project depend on an external service? Do you pay for this service to ensure that it remains up? If you pull your dependencies from a repository, how much do you trust those who maintain the packages? Do you even know their names? What would be your project’s fate if that service went down permanently? It will someday, though hopefully only after your project is dead and forgotten. If you have the ability to work permanently offline, then you already have happy answers to all these questions.

No, PHP Doesn't Have Closures

2019-09-25T21:10:43Z

The PHP programming language is bizarre and, if nothing else, worthy of anthropological study. The only consistent property of PHP is how badly it’s designed, yet it somehow remains widely popular. There’s a social dynamic at play here that science has yet to unlock.

I don’t say this because I hate PHP. There’s no reason for that: I don’t write programs in PHP, never had to use it, and don’t expect to ever need it. Despite this, I just can’t look away from PHP in the same way I can’t look away from a car accident.

I recently came across a link to the PHP manual, and morbid curiosity that caused me to look through it. It’s fun to pick an arbitrary section of the manual and see how many crazy design choices I can spot, or at least see what sort of strange terminology the manual has invented to describe a common concept. This time around, one such section was on anonymous functions, including closures. It was even worse than I expected.

In some circumstances, closures can be a litmus test. Closure semantics are not complex, but they’re subtle and a little tricky until you get hang of them. If you’re interviewing a candidate, toss in a question or two about closures. Either they’re familiar and get it right away, or they’re unfamiliar and get nothing right. The latter is when it’s most informative. PHP itself falls clearly into the latter. Not only that, the example of a “closure” in the manual demonstrates a “closure” closing over a global variable!

I’d been told for years that PHP has closures, and I took that claim at face value. In fact, PHP has had “closures” since 5.3.0, released in June 2009, so I’m over a decade late in investigating it. However, as far as I can tell, nobody’s ever pointed out that PHP “closures” are, in fact, not actually closures.

Anonymous functions and closures

Before getting into why they’re not closures, let’s go over how it works, starting with a plain old anonymous function. PHP does have anonymous functions — the easy part.

function foo() {
    return function() {
        return 1;
    };
}

The function foo returns a function that returns 1. In PHP 7 you can call the returned function immediately like so:

$r = foo()();  // $r = 1

In PHP 5 this is a syntax error because, well, it’s PHP and its parser is about as clunky as Matlab’s.

In a well-designed language, you’d expect that this could also be a closure. That is, it closes over local variables, and the function may continue to access those variables later. For example:

function bar($n) {
    return function() {
        return $n;
    };
}

bar(1)();  // error: Undefined variable: n

This fails because you must explicitly tell PHP what variables you intend to access inside the anonymous function with use:

function bar($n) {
    return function() use ($n) {
        return $n;
    };
}

bar(1)();  // 1

If this actually closed over $n, this would be a legitimate closure. Having to tell the language exactly which variables are being closed over would be pretty dumb, but it still meets the definition of a closure.

But here’s the catch: It’s not actually closing over any variables. The names listed in use are actually extra, hidden parameters bound to the current value of those variables. In other words, this is nothing more than partial function evaluation.

function bar($n) {
    $f = function() use ($n) {
        return $n;
    };
    $n++;  // never used!
    return $f;
}

$r = bar(1)();  // $r = 1

Here’s the equivalent in JavaScript using the bind() method:

function bar(n) {
    let f = function(m) {
        return m;
    };
    return f.bind(null, n);
}

This is actually more powerful than PHP’s “closures” since any arbitrary expression can be used for the bound argument. In PHP it’s limited to a couple of specific forms. If JavaScript didn’t have proper closures, and instead we all had to rely on bind(), nobody would claim that JavaScript had closures. It shouldn’t be different for PHP.

References

PHP does have references, and binding a reference to an anonymous function is kinda, sorta like a closure. But that’s still just partial function evaluation, but where that argument is a reference.

Here’s how to tell these reference captures aren’t actually closures: They work equally well for global variables as local variables. So it’s still not closing over a lexical environment, just binding a reference to a parameter.

$counter = 0;

function bar($n) {
    global $counter;
    $f = function() use (&$n, &$counter) {
        $counter++;
        return $n;
    };
    $n++;  // now has an effect
    return $f;
}

$r = bar(1)();  // $r = 2, $counter = 1

In the example above, there’s no difference between $n, a local variable, and $counter, a global variable. It wouldn’t make sense for a closure to close over a global variable.

Emacs Lisp partial function application

Emacs Lisp famously didn’t get lexical scope, and therefore closures, until fairly recently. It was — and still is by default — a dynamic scope oddball. However, it’s long had an apply-partially function for partial function application. It returns a closure-like object, and did so when the language didn’t have proper closures. So it can be used to create a “closure” just like PHP:

(defun bar (n)
  (apply-partially (lambda (m) m) n))

This works regardless of lexical or dynamic scope, which is because this construct isn’t really a closure, just like PHP’s isn’t a closure. In PHP, its partial function evaluation is built directly into the language with special use syntax.

Monkey see, monkey do

Why does the shell command language use sigils? Because it’s built atop interactive command line usage, where bare words are taken literally and variables are the exception. Why does Perl use sigils? Because it was originally designed as an alternative to shell scripts, so it mimicked that syntax. Why does PHP use sigils? Because Perl did.

The situation with closures follows that pattern, and it comes up all over PHP. Its designers see a feature in another language, but don’t really understand its purpose or semantics. So when they attempt to add that feature to PHP, they get it disastrously wrong.

From Vimperator to Tridactyl

2018-09-20T15:01:46Z

Earlier this month I experienced a life-changing event — or so I thought it would be. It was fully anticipated, and I had been dreading the day for almost a year, wondering what I was going to do. Could I overcome these dire straits? Would I ever truly accept the loss, or will I become a cranky old man who won’t stop talking about how great it all used to be?

So what was this big event? On September 5th, Mozilla officially and fully ended support for XUL extensions (XML User Interface Language), a.k.a. “legacy” extensions. The last Firefox release to support these extensions was Firefox 52 ESR, the browser I had been using for some time. A couple days later, Firefox 60 ESR entered Debian Stretch to replace it.

The XUL extension API was never well designed. It was clunky, quirky, and the development process for extensions was painful, requiring frequent restarts. It was bad enough that I was never interested in writing my own extensions. Poorly-written extensions unfairly gave Firefox a bad name, causing memory leaks and other issues, and Firefox couldn’t tame the misbehavior.

Yet this extension API was incredibly powerful, allowing for rather extreme UI transformations that really did turn Firefox into a whole new browser. For the past 15 years I wasn’t using Firefox so much as a highly customized browser based on Firefox. It’s how Firefox has really stood apart from everyone else, including Chrome.

The wide open XUL extension API was getting in the way of Firefox moving forward. Continuing to support it required sacrifices that Mozilla was less and less willing to make. To replace it, they introduced the WebExtensions API, modeled very closely after Chrome’s extension API. These extensions are sandboxed, much less trusted, and the ecosystem more closely resembles the “app store” model (Ugh!). This is great for taming poorly-behaved extensions, but they are far less powerful and capable.

The powerful, transformative extension I’d been using the past decade was Vimperator — and occasionally with temporary stints in its fork, Pentadactyl. It overhauled most of Firefox’s interface, turning it into a Vim-like modal interface. In normal mode I had single keys bound to all sorts of useful functionality.

The problem is that Vimperator is an XUL extension, and it’s not possible to fully implement using the WebExtensions API. It needs capabilities that WebExtensions will likely never provide. Losing XUL extensions would mean being thrown back 10 years in terms my UI experience. The possibility of having to use the web without it sounded unpleasant.

Fortunately there was a savior on the horizon already waiting for me: Tridactyl! It is essentially a from-scratch rewrite of Vimperator using the WebExtensions API. To my complete surprise, these folks have managed to recreate around 85% of what I had within the WebExtensions limitations. It will never be 100%, but it’s close enough to keep me happy.

What matters to me

There are some key things Vimperator gave me that I was afraid of losing.

Browser configuration from a text file.

I keep all my personal configuration dotfiles under source control. It’s a shame that Firefox, despite being so flexible, has never supported this approach to configuration. Fortunately Vimperator filled this gap with its .vimperatorrc file, which could not only be used to configure the extension but also access nearly everything on the about:config page. It’s the killer feature Firefox never had.

Since WebExtensions are sandboxed, they cannot (normally) access files. Fortunately there’s a work around: native messaging. It’s a tiny, unsung backdoor that closes the loop on some vital features. Tridactyl makes it super easy to set up (:installnative), and doing so enables the .tridactylrc file to be loaded on startup. Due to WebExtensions limitations it’s not nearly as powerful as the old .vimperatorrc but it covers most of my needs.

Edit any text input using a real text editor.

In Vimperator, when a text input is focused I could press CTRL+i to pop up my $EDITOR (Vim, Emacs, etc.) to manipulate the input much more comfortably. This is so, so nice when writing long form content on the web. The alternative is to copy-paste back and forth, which is tedious and error prone.

Since WebExtensions are sandboxed, they cannot (normally) start processes. Again, native messaging comes to the rescue and allows Tridactyl to reproduce this feature perfectly.

Mouseless browsing.

In Vimperator I could press f or F to enter a special mode that allowed me to simulate a click to a page element, usually a hyperlink. This could be used to navigate without touching the mouse. It’s really nice for “productive” browsing, where my fingers are already on home row due to typing (programming or writing), and I need to switch to a browser to look something up. I rarely touch the mouse when I’m in productive mode.

This actually mostly works fine under WebExtensions, too. However, due to sandboxing, WebExtensions aren’t active on any of Firefox’s “meta” pages (configuration, errors, etc.), or Mozilla’s domains. This means no mouseless navigation on these pages.

The good news is that Tridactyl has better mouseless browsing than Vimperator. Its “tag” overlay is alphabetic rather than numeric, so it’s easier to type. When it’s available, the experience is better.

Custom key bindings for everything.

In normal mode, which is the usual state Vimperator/Tridactyl is in, I’ve got useful functionality bound to single keys. There’s little straining for the CTRL key. I use d to close a tab, u to undo it. In my own configuration I use w and e to change tabs, and x and c to move through the history. I can navigate to any “quickmark” in three keystrokes. It’s all very fast and fluid.

Since WebExtensions are sandboxed, extensions have limited ability to capture these keystrokes. If the wrong browser UI element is focused, they don’t work. If the current page is one of those extension-restricted pages, these keys don’t work.

The worse problem of all, by far, is that WebExtensions are not active until the current page has loaded. This is the most glaring flaw in WebExtensions, and I’m surprised it still hasn’t been addressed. It negatively affects every single extension I use. What this means for Tridactyl is that for a second or so after navigating a link, I can’t interact with the extension, and the inputs are completely lost. This is incredibly frustrating. I have to wait on slow, remote servers to respond before regaining control of my own browser, and I often forget about this issue, which results in a bunch of eaten keystrokes. (Update: Months have passed and I’ve never gotten used to this issue. It irritates me a hundred times every day. This is by far Firefox’s worst design flaw.)

Other extensions

I’m continuing to use uBlock Origin. Nothing changes. As I’ve said before, an ad-blocker is by far the most important security tool on your computer. If you practice good computer hygiene, malicious third-party ads/scripts are the biggest threat vector for your system. A website telling you to turn off your ad-blocker should be regarded as suspiciously as being told to turn off your virus scanner (for all you Windows users who are still using one).

The opposite of mouseless browsing is keyboardless browsing. When I’m not being productive, I’m often not touching the keyboard, and navigating with just the mouse is most comfortable. However, clicking little buttons is not. So instead of clicking the backward and forward buttons, I prefer to swipe the mouse, e.g. make a gesture.

I previously used FireGestures, an XUL extension. I’m now using Gesturefy. (Update: Gesturefy doesn’t support ESR either.) I also considered Foxy Gestures, but it doesn’t currently support ESR releases. Unfortunately all mouse gesture WebExtensions suffer from the page load problem: any gesture given before the page loads is lost. It’s less of any annoyance than with Tridactyl, but it still trips me up. They also don’t work on extension-restricted pages.

Firefox 60 ESR is the first time I’m using a browser supported by uMatrix — another blessing from the author of uBlock Origin (Raymond Hill) — so I’ve been trying it out. Effective use requires some in-depth knowledge of how the web works, such as the same-origin policy, etc. It’s not something I’d recommend for most people.

GreaseMonkey was converted to the WebExtensions API awhile back. As a result it’s a bit less capable than it used to be, and I had to adjust a couple of my own scripts before they’d work again. I use it as a “light extension” system.

XUL alternatives

Many people have suggested using one of the several Firefox forks that’s maintaining XUL compatibility. I haven’t taken this seriously for a couple of reasons:

Maintaining a feature-complete web browser like Firefox is a very serious undertaking, and I trust few organizations to do it correctly. Firefox and Chromium forks have a poor security track record.

Even the Debian community gave up on that idea long ago, and they’ve made a special exception that allows recent versions of Firefox and Chrome into the stable release. Web browsers are huge and complex because web standards are huge and complex (a situation that concerns me in the long term). The vulnerabilities that pop up regularly are frightening.

In Back to the Future Part II, Biff Tannen was thinking too small. Instead of a sports almanac, he should have brought a copy of the CVE database.

This is why I also can’t just keep using an old version of Firefox. If I was unhappy with, say, the direction of Emacs 26, I could keep using Emacs 25 essentially forever, frozen in time. However, Firefox is internet software. Internet software decays and must be maintained.

The community has already abandoned XUL extensions.

Most importantly, the Vimperator extension is no longer maintained. There’s no reason to stick around this ghost town.

Special Tridactyl customizations

The syntax for .tridactylrc is a bit different than .vimperatorrc, so I couldn’t just reuse my old configuration file. Key bindings are simple enough to translate, and quickmarks are configured almost the same way. However, it took me some time to figure out the rest.

With Vimperator I’d been using Firefox’s obscure “bookmark keywords” feature, where a bookmark is associated with a single word. In Vimperator I’d use this as a prefix when opening a new tab to change the context of the location I was requesting.

For example, to visit the Firefox subreddit I’d press o to start opening a new tab, then r firefox. I had r registered via .vimperatorrc as the bookmark keyword for the URL template https://old.reddit.com/r/%s.

WebExtensions doesn’t expose bookmark keywords, and keywords are likely to be removed in a future Firefox release. So instead someone showed me this trick:

set searchurls.r   https://old.reddit.com/r/%s
set searchurls.w   https://en.wikipedia.org/w/index.php?search=%s
set searchurls.wd  https://en.wiktionary.org/wiki/?search=%s

These lines in .tridactylrc recreates the old functionality. Works like a charm!

Another initial annoyance is that WebExtensions only exposes the X clipboard (XA_CLIPBOARD), not the X selection (XA_PRIMARY). However, I nearly always use the X selection for copy-paste, so it was like I didn’t have any clipboard access. (Honestly, I’d prefer XA_CLIPBOARD didn’t exist at all.) Again, native messaging routes around the problem nicely, and it’s trivial to configure:

set yankto both
set putfrom selection

There’s an experimental feature, guiset to remove most of Firefox’s UI elements, so that it even looks nearly like the old Vimperator. As of this writing, this feature works poorly, so I’m not using it. It’s really not important to me anyway.

Today’s status

So I’m back to about 85% of the functionality I had before the calamity, which is far better than I had imagined. Other than the frequent minor annoyances, I’m pretty satisfied.

In exchange I get better mouseless browsing and much better performance. I’m not kidding, the difference Firefox Quantum makes is night and day. ~~In my own case, Firefox 60 ESR is using one third of the memory of Firefox 52 ESR~~ (Update: after more experience with it, I realize its just as much of a memory hog as before), and I’m not experiencing the gradual memory leak. ~~This really makes a difference on my laptop with 4GB of RAM.~~

So was it worth giving up that 15% capability for these improvements? Perhaps it was. Now that I’ve finally made the leap, I’m feeling a lot better about the whole situation.

Ten Years of Blogging

2017-09-01T03:47:36Z

As of today, I’ve been blogging for 10 years. In this time I’ve written 302,000 words across 343 articles — a rate of one article every week and a half. These articles form a record of my professional progress, touching on both “hard” technical skills and “soft” communication skills. My older articles are a personal reminder of how far I’ve come. They are proof that I’m not as stagnant as I sometimes feel, and it helps me to sympathize with others who are currently in those earlier stages of their own career.

That index where you can find these 343 articles is sorted newest-first, because it correlates with best-first. It’s a trend I hope to continue.

History

Before blogging, I had a simple Penn State student page showcasing a few small — and, in retrospect, trivial — side projects (1, 2, 3), none of which went anywhere. Around the beginning of my final semester of college, I was inspired by Mark Dominus’ The Universe of Discourse and Andy Owen’s Friendly Robot Overlord (gone since 2011) to start my own blosxom blog. It would be an outlet to actively discuss my projects. Some time later GitHub was founded, and I switched to a static blog hosted by GitHub Pages, which is where it lives to this day.

It’s been more challenging to manage all this content than I ever anticipated. It’s like maintaining a large piece of software, except it’s naturally more fragile. Any time I make a non-trivial change to the CSS, I have to inspect the archives to check if I broke older articles. If I did, sometimes it’s a matter of further adjusting the CSS. Other times I’ll mass edit a couple hundred articles in order to normalize some particular aspect, such as heading consistency or image usage. (Running a macro over an Emacs’ Dired buffer is great for this.)

I decided in those early days to Capitalize Every Word of the Title, and I’ve stuck with this convention purely out of consistency even though it’s looked weird to me for years. I don’t want to edit the old titles, and any hard changeover date would be even weirder (in the index listing).

With more foresight and experience, I could have laid down better conventions for myself from the beginning. Besides the universal impossibility of already having experience before it’s needed, there’s also the issue that the internet’s conventions have changed, too. This blog is currently built with HTML5, but this wasn’t always the case — especially considering that predates HTML5. When I switched to HTML5, I also adjusted some of my own conventions to match, since, at the time, I was still writing articles in raw HTML.

The mobile revolution also arrived since starting this blog. Today, about one third of visitors read the blog from a mobile device. I’ve also adjusted the CSS to work well on these devices. To that third of you: I hope you’re enjoying the experience!

Just in case you haven’t tried it, the blog also works really well with terminal-based browsers, such as Lynx and ELinks. Go ahead and give it a shot. The header that normally appears at the top of the page is actually at the bottom of the HTML document structure. It’s out of the way for browsers that ignore CSS.

If that’s not enough, last year I also spent effort making the printed style of my articles look nice. Take a look at the printed version of this article (i.e. print preview, print to PDF), and make sure to turn off the little headers added by the browser. A media selector provides a separate print stylesheet. Chrome / Chromium has consistently had the best results, followed by Firefox. Someday I’d like for browsers to be useful as typesetting engines for static documents — as an alternative to LaTeX and Groff — but they’ve still got a ways to go with paged media. (Trust me, I’ve tried.)

With more foresight, I could have done something better with my permanent URLs. Notice how they’re just dates and don’t include the title. URLs work better when they include human-meaningful context. Ideally I should be able to look at any one of my URLs and know what it’s about. Again, this decision goes all the way back to those early days when I first configured blosxom, not knowing any better.

URLs are forever, and I don’t want to break all my old links. Consistency is better than any sort of correction. I’m also practically limited to one article per day, though this has never been a real problem.

Motivation

For me, an important motivation for writing is to say something unique about a topic. For example, I’m not interested in writing a tutorial unless either no such tutorial already exists, or there’s some vital aspect all the existing tutorials miss. Each article should add new information to the internet, either raw information or through assembling existing information in a new way or with a unique perspective.

I also almost constantly feel like I’m behind the curve, like I don’t know enough or I don’t have enough skill. As many of you know, the internet is really effective at causing these feelings. Every day lots of talented people are sharing interesting and complicated things across all sorts of topics. For topics that overlap my own desired expertise, those projects have me thinking, “Man, I have no idea how to even start doing that. There’s so much I don’t know, and so much more I need to learn.” Writing articles as I learn is a great way to keep on top of new subjects.

This is tied to another problem: I have a tendency to assume the things I’ve known for awhile are common knowledge. This shows up in a few ways. First, if everyone knows what I know, plus they know a bunch of things that I don’t know, then I’ve got a lot of catching up to do. That’s another source of feeling behind the curve.

Second, when writing an article on a topic where I’ve got years of experience, I leave out way too many important details, assuming the reader already knows them. When an article I regard as valuable gets a weak response, it’s probably related to this issue.

Third, after three years of teaching, it seems like it’s becoming more difficult to put myself in the student’s shoes. I’m increasingly further away from my own days learning those early topics, and it’s harder to remember the limitations of my knowledge at that time. Having this blog really helps, and I’ve re-read some of my older articles to recall my mindset at the time.

Another way the blog helps is that it’s like having my own textbook. When teaching a topic to someone — and not necessarily a formal mentee — or even when just having a discussion, I will reference my articles when appropriate. Since they’re designed to say something unique, my article may be the only place to find certain information in a conveniently packaged form.

Finally, the last important motivating factor is that I want to spread my preferred memes. Obviously the way I do things is the Right Way, and the people who do things differently (the Wrong Way) are stupid and wrong. By writing about the sorts of topics and technologies I enjoy — C, low-level design, Emacs, Vim, programming on unix-like systems, my personal programming style — I’m encouraging others to follow my lead. Surely I’m responsible for at least one Emacs convert out there!

“Formal” professional writing

Despite having three or four novels worth of (mostly) technical writing here, my formal workplace writing leaves much to be desired. I’m no standout in this area. In the same period of time I’ve written just a handful of formal memos, each about the same length as a long blog post.

Why so few? These memos are painful to write. In order to be officially recognized, the formal memo process must be imposed upon me. What this means is that, compared to a blog post, these memos take at least an order of magnitude longer to write.

The process involves more people, dragged out over a long period of time. The writing is a lot less personal, which, in my opinion, makes it drier and less enjoyable. After the initial draft is complete, I have to switch to vastly inferior tools: emailing the same Microsoft Office documents back and forth between various people, without any proper source control. The official, mandated memo template was created by someone who didn’t know how to effectively operate a word processor, who had a poor sense of taste (ALL CAPS HEADINGS), and who obviously had no training in typesetting or style.

At the end of this long process, it’s filed into a system with no practical search capability, and where it will be quietly purged after five years, never to be seen again. Outside of the reviewers who were directly involved in the memo process, somewhere between zero and two people will have actually read it. Literally.

Arguably the memo might more polished than a blog post. I’m skeptical of this, but suppose that’s true. I’d still much rather have written ten less-polished blog posts than one more-polished memo. That’s also ten shots to produce, by chance, a more valuable article than the single, precious memo.

Let’s broaden the scope to academic papers. Thanks to some great co-workers — all three of whom are smarter and handsomer than me — a year ago I finally got a published academic paper under my belt (and more to come): ROP Gadget Prevalence and Survival under Compiler-based Binary Diversification Schemes (and I said memos were dry!). A ton of work went into this paper, and it’s far more substantial than any memo or single blog post. The process was a lot more pleasant (LaTeX instead of Word), and the results are definitely much more polished than a typical blog post. It reads well and has interesting information to present.

This all sounds great until you consider the impact. According to ACM’s statistics, the paper has been accessed 130 times as of this writing. (Yes, providing an unofficial link to the paper like I just did above doesn’t help, but I ran out of those crappy “free” links. Sue me.) Sure, the PDF might have been passed around in untrackable ways, but I also bet a lot of those accesses were just downloads that were never read. So let’s split the difference and estimate around 130 people read it.

What kind of impact does a blog post have? Talking about these numbers feels a bit taboo, like discussing salaries, but it’s important for the point I’m about to make.

July 2017: Rolling Shutter Simulation in C
- 10,400 unique visitors
- #22 most popular
August 2017: A Tutorial on Portable Makefiles
- 14,300 unique visitors
- #16 most popular
June 2017: Switching to the Mutt Email Client
- 23,100 unique visitors
- #5 most popular
May 2015: Raw Linux Threads via System Calls
- 40,500 unique visitors
- #1 most popular, not even counting the Japanese publication

Note that all but the last has been published for less time than our paper. The average time on these pages is between 5 and 6 minutes, so these are actual readers, not just visitors that take one glance and leave. Thanks to the information age, a technical blog article on established blog can reach an audience 100 times larger than a journal for a fraction of the effort and cost. There are other benefits, too:

I get immediate feedback in the form of comments and email (open peer review).
The content is available for free (open access). It’s trivial to link and share blog articles.
Even more, this entire blog is in the public domain. If you don’t believe me, check out the public domain dedication in the footer of this page. It’s been there for years, and you can verify that yourself. Every single change to this blog in the past 6 years has been publicly documented (transparency).
When I write about a topic, I make it a goal to provide the code and data to try it for yourself (open data and materials). This code and data is also either in the public domain, or as close to it as possible.
Link aggregators and social media are great at finding the best stuff and making it more visible (censorship resistance). When I have a big hit, it’s often Reddit or Hacker News driving people to my article. Sometimes it’s other blogs.

In 2017, a blog is by far the most effective way to publish and share written information, and have more scientific quality than journals. More people should be publishing through blog articles than traditional forms of publications, which are far less effective.

Since this has proven to be such an effective medium, I’m going to make a promise right here, right now. And as I explained above with transparency, there are no take-backs on this one. If you’re reading this, it’s on the public record forever. I promise to deliver another 10 years of free, interesting, useful content. This stuff is going to be even better, too. On this blog’s 20th anniversary, September 1st, 2027, we’ll see how I did.

The Vulgarness of Abbreviated Function Templates

2016-10-02T23:59:59Z

The auto keyword has been a part of C and C++ since the very beginning, originally as a one of the four storage class specifiers: auto, register, static, and extern. An auto variable has “automatic storage duration,” meaning it is automatically allocated at the beginning of its scope and deallocated at the end. It’s the default storage class for any variable without external linkage or without static storage, so the vast majority of variables in a typical C program are automatic.

In C and C++ prior to C++11, the following definitions are equivalent because the auto is implied.

int
square(int x)
{
    int x2 = x * x;
    return x2;
}

int
square(int x)
{
    auto int x2 = x * x;
    return x2;
}

As a holdover from really old school C, unspecified types in C are implicitly int, and even today you can get away with weird stuff like this:

/* C only */
square(x)
{
    auto x2 = x * x;
    return x2;
}

By “get away with” I mean in terms of the compiler accepting this as valid input. Your co-workers, on the other hand, may become violent.

Like register, as a storage class auto is an historical artifact without direct practical use in modern code. However, as a concept it’s indispensable for the specification. In practice, automatic storage means the variables lives on “the” stack (or one of the stacks), but the specifications make no mention of a stack. In fact, the word “stack” doesn’t appear even once. Instead it’s all described in terms of “automatic storage,” rightfully leaving the details to the implementations. A stack is the most sensible approach the vast majority of the time, particularly because it’s both thread-safe and re-entrant.

C++11 Type Inference

One of the major changes in C++11 was repurposing the auto keyword, moving it from a storage class specifier to a a type specifier. In C++11, the compiler infers the type of an auto variable from its initializer. In C++14, it’s also permitted for a function’s return type, inferred from the return statement.

This new specifier is very useful in idiomatic C++ with its ridiculously complex types. Transient variables, such as variables bound to iterators in a loop, don’t need a redundant type specification. It keeps code DRY (“Don’t Repeat Yourself”). Also, templates easier to write, since it makes the compiler do more of the work. The necessary type information is already semantically present, and the compiler is a lot better at dealing with it.

With this change, the following is valid in both C and C++11, and, by sheer coincidence, has the same meaning, but for entirely different reasons.

int
square(int x)
{
    auto x2 = x * x;
    return x2;
}

In C the type is implied as int, and in C++11 the type is inferred from the type of x * x, which, in this case, is int. The prior example with auto int x2, valid in C++98 and C++03, is no longer valid in C++11 since auto and int are redundant type specifiers.

Occasionally I wish I had something like auto in C. If I’m writing a for loop from 0 to n, I’d like the loop variable to be the same type as n, even if I decide to change the type of n in the future. For example,

struct foo *foo = foo_create();
for (int i = 0; i < foo->n; i++)
    /* ... */;

The loop variable i should be the same type as foo->n. If I decide to change the type of foo->n in the struct definition, I’d have to find and update every loop. The idiomatic C solution is to typedef the integer, using the new type both in the struct and in loops, but I don’t think that’s much better.

Abbreviated Function Templates

Why is all this important? Well, I was recently reviewing some C++ and came across this odd specimen. I’d never seen anything like it before. Notice the use of auto for the parameter types.

void
set_odd(auto first, auto last, const auto &x)
{
    bool toggle = false;
    for (; first != last; first++, toggle = !toggle)
        if (toggle)
            *first = x;
}

Given the other uses of auto as a type specifier, this kind of makes sense, right? The compiler infers the type from the input argument. But, as you should often do, put yourself in the compiler’s shoes for a moment. Given this function definition in isolation, can you generate any code? Nope. The compiler needs to see the call site before it can infer the type. Even more, different call sites may use different types. That sounds an awful lot like a template, eh?

template<typename T, typename V>
void
set_odd(T first, T last, const V &x)
{
    bool toggle = false;
    for (; first != last; first++, toggle = !toggle)
        if (toggle)
            *first = x;
}

This is a proposed feature called abbreviated function templates, part of C++ Extensions for Concepts. It’s intended to be shorthand for the template version of the function. GCC 4.9 implements it as an extension, which is why the author was unaware of its unofficial status. In March 2016 it was established that abbreviated function templates would not be part of C++17, but may still appear in a future revision.

Personally, I find this use of auto to be vulgar. It overloads the keyword with a third definition. This isn’t unheard of — static also serves a number of unrelated purposes — but while similar to the second form of auto (type inference), this proposed third form is very different in its semantics (far more complex) and overhead (potentially very costly). I’m glad it’s been rejected so far. Templates better reflect the nature of this sort of code.

Shamus Young's Twenty-Sided Tale E-book

2015-09-03T19:20:09Z

Last month I assembled and edited Shamus Young’s Twenty-Sided Tale, originally a series of 84 blog articles, into an e-book. The book is 75,000 words — about the average length of a novel — recording the complete story of one of Shamus’ Dungeons and Dragons campaigns. Since he’s shared the e-book on his blog, I’m now free to pull back the curtain on this little project.

Download: twenty-sided-tale.epub
Repository: https://github.com/skeeto/twenty-sided-tale

To build the book yourself, you will only need make and pandoc.

Why did I want this?

Ever since I got a tablet a couple years ago, I’ve completely switched over to e-books. Prior to the tablet, if there was an e-book I wanted to read, I’d have to read from a computer monitor while sitting at a desk. Anyone who’s tried it can tell you it’s not a comfortable way to read for long periods, so I only reserved the effort for e-book-only books that were really worth it. However, once comfortable with the tablet, I gave away nearly all my paper books from my bookshelves at home. The remaining use of paper books is because either an e-book version isn’t reasonably available or the book is very graphical, not suited to read/view on a screen (full image astronomy books, Calvin and Hobbes collections).

As far as formats go, I prefer PDF and ePub, depending on the contents of the book. Technical books fare better as PDFs due to elaborate typesetting used for diagrams and code samples. For prose-oriented content, particularly fiction, ePub is the better format due to its flexibility and looseness. Twenty-Sided Tale falls in this latter category. The reader gets to decide the font, size, color, contrast, and word wrapping. I kept the ePub’s CSS to a bare minimum as to not get in the reader’s way. Unfortunately I’ve found that most ePub readers are awful at rendering content, so while technically you could do the same fancy typesetting with ePub, it rarely works out well.

The Process

To start, I spent about 8 hours with Emacs manually converting each article into Markdown and concatenating them into a single document. The ePub is generated from the Markdown using the Pandoc “universal document converter.” The markup includes some HTML, because Markdown alone, even Pandoc’s flavor, isn’t expressive enough for the typesetting needs of this particular book. This means it can only reasonably be transformed into HTML-based formats.

Pandoc isn’t good enough for some kinds of publishing, but it was sufficient here. The one feature I really wished it had was support for tagging arbitrary document elements with CSS classes (images, paragraphs, blockquotes, etc.), effectively extending Markdown’s syntax. Currently only headings support extra attributes. Such a feature would have allowed me to bypass all use of HTML, and the classes could maybe have been re-used in other output formats, like LaTeX.

Once I got the book in a comfortable format, I spent another 1.5 weeks combing through the book fixing up punctuation, spelling, grammar, and, in some cases, wording. It was my first time editing a book — fiction in particular — and in many cases I wasn’t sure of the correct way to punctuate and capitalize some particular expression. Is “Foreman” capitalized when talking about a particular foreman? What about “Queen?” How are quoted questions punctuated when the sentence continues beyond the quotes? As an official source on the matter, I consulted the Chicago Manual of Style. The first edition is free online. It’s from 1906, but style really hasn’t changed too much over the past century!

The original articles were written over a period of three years. Understandably, Shamus forgot how some of the story’s proper names were spelled over this time period. There wasn’t a wiki to check. Some proper names had two, three, or even four different spellings. Sometimes I picked the most common usage, sometimes the first usage, and sometimes I had to read the article’s comments written by the game’s players to see how they spelled their own proper names.

I also sunk time into a stylesheet for a straight HTML version of the book, with the images embedded within the HTML document itself. This will be one of the two outputs if you build the book in the repository.

A Process to Improve

Now I’ve got a tidy, standalone e-book version of one of my favorite online stories. When I want to re-read it again in the future, it will be as comfortable as reading any other novel.

This has been a wonderful research project into a new domain (for me): writing and editing, style, and today’s tooling for writing and editing. As a software developer, the latter overlaps my expertise and is particularly fascinating. A note to entrepreneurs: There’s massive room for improvement in this area. Compared software development, the processes in place today for professional writing and editing is, by my estimates, about 20 years behind. It’s a place where Microsoft Word is still the industry standard. Few authors and editors are using source control or leveraging the powerful tools available for creating and manipulating their writing.

Unfortunately it’s not so much a technical problem as it is a social/educational one. The tools mostly exist in one form or another, but they’re not being put to use. Even if an author or editor learns or builds a more powerful set of tools, they must still interoperate with people who do not. Looking at it optimistically, this is a potential door into the industry for myself: a computer whiz editor who doesn’t require Word-formatted manuscripts; who can make the computer reliably and quickly perform the tedious work. Or maybe that idea only works in fiction.

Tag Feeds for null program

2014-06-08T05:53:46Z

I just added a formal tags page along with individual feeds for each tag. I’ve had tags for a couple of years now, but they were really only useful for traveling sideways to similar articles. So now, if you’re only interested in a subset of my content, you can subscribe to one or more tags rather than the main Atom feed.

What prompted this? In my Emacs Chat, Sacha asked me if this blog was part of Planet Emacsen (currently, it’s not). If my tags are accurate, only about 25% of my articles are about Emacs, so most of my blog isn’t relevant there. Tag feeds will go a long way to help support these “planet” aggregators, should they want to include my articles. For example, Planet Emacsen would use my Emacs feed.

Static Site Generation

I couldn’t practically support these extra feeds until recently. Remember, this blog is statically generated. More feeds means more content to generate, because articles are duplicated in whole for each feed. In past years, Jekyll would probably take on the order of an hour to do all this for a single build. Fortunately, Jekyll has improved dramatically, especially in the past year or so, and these feeds have little impact on the total build time. It’s currently around 10 seconds or so. Not bad at all!

A consequence of being statically generated is that you can’t ask for a combination of tags as a single feed. It would be a combinatorial nightmare (billions of feeds). Plus, the request would have to normalize the tag order (e.g. alphabetical) or else the combinatorial explosion to be far worse (i.e. exceeding the number of atoms in the universe). So I hope you can forgive me when subscribing to each tag individually.

Duplicate Articles

What if an article matches multiple tags? It will appear in each feed where it’s tagged, possibly showing up multiple times in your web feed reader. Fortunately, this is where Atom saves the day! I’m leveraging Atom’s prudent design to make this work cleanly. Articles’ UUIDs are consistent across all of these feeds, so if your web feed reader is smart enough, it will recognize these as being the same article. For example, this article is f47e5404-cc4a-3cc0-01ce-a844c04721b8 regardless of which feed you see it in.

Unfortunately, Elfeed isn’t smart enough for this. Sorry! In order to better support all the broken RSS feeds out there, I had to compromise on entry keying. I couldn’t trust RSS feeds to provide me a reasonably unique key, so, transitively, Elfeed doesn’t fully trust Atom’s UUIDs either. These RSS feeds are broken largely because RSS itself is a broken mess. When making new feeds in the future, please use Atom!

Atom requires that every feed and article have a proper UUID. It doesn’t matter where you get the feed from. You could subscribe to the same exact feed at three different URLs (mirrors perhaps) and your reader could reliably use the UUIDs to avoid duplication. Or, if you’re subscribed to an aggregator like Planet Emacsen, and it includes content from a feed to which you’re also directly subscribed, your reader client should be able to merge these articles. In comparison, RSS not only doesn’t require UUIDs, it actively discourages them with its broken guid tag, so merging content from multiple sources is impossible with RSS.

Anyway, if most of my content doesn’t suit you, you can now subscribe to the subset that does. Aren’t Atom feeds cool?

The Julia Programming Language

2014-03-06T23:55:44Z

Update 2020: This is an old, outdated review. With the benefit of more experience, I no longer agree with my criticsms in this article.

Julia is a new programming language primarily intended for scientific computing. It’s attempting to take on roles that are currently occupied by Matlab, its clones, and R. “Matlab done right” could very well be its tag-line, but it’s more than that. It has a beautiful type system, it’s homoiconic, and its generic function support would make a Lisp developer jealous. It still has a long ways to go, but, except for some unfortunate issues, it’s off to a great start.

Speaking strictly in terms of the language, doing better than Matlab isn’t really a significant feat. Among major programming languages, Matlab’s awfulness and bad design is second only to PHP. Octave fixes a lot of the Matlab language, but it can only go so far.

For both Matlab and R, the real strength is the enormous library of toolboxes and functionality available to help solve seemingly any scientific computing task. Plus the mindshare and the community. Julia has none of this yet. The language is mostly complete, but it will take years to build up its own package library to similar standards.

If you’re curious about learning more, the Julia manual covers the entire language as it currently exists. Unfortunately anything outside the language proper and its standard library is under-documented at this time.

A Beautiful Type System

One of the first things you’ll be told is that Julia is dynamically typed. That is, statically typed (C++, Java, Haskell) versus dynamically typed (Lisp, Python, JavaScript). However, Julia has the rather unique property that it straddles between these, and it could be argued to belong to one or the other.

The defining characteristic of static typing is that bindings (i.e. variables) have types. In dynamic typing, only values and objects have types. In Julia, all bindings have a type, making it like a statically typed language. If no type is explicitly declared, that type is Any, an abstract supertype of all types. This comes into play with generic functions.

Both abstract and concrete types can be parameterized by other types, and certain values. The :: syntax it used to declare a type.

type Point {T}
  x::T
  y::T
end

This creates a Point constructor function. When calling the constructor, the parameter type can be implicit, derived from the type of its arguments, or explicit. Because both x and y have the same type, so must the constructor’s arguments.

# Implicit type:
Point(1, -1)
# => Point{Int64}(1,-1)

# Explicit type:
Point{Float64}(1.1, -1.0)
# => Point{Float64}(1.1,-1.0)

Point(1, 1.0)
# ERROR: no method Point{T}(Int64,Float64)

The type can be constrained using <:. If Point is declared like the following it is restricted to real numbers. This is just like Java’s Point.

type Point {T <: Real}
  x::T
  y::T
end

Unlike most languages, arrays aren’t built directly into the language. They’re implemented almost entirely in Julia itself using this type system. The special part is that they get literal syntax.

[1, 2, 3]
# => Array{Int64,1}

[1.0 2.0; 3.0 4.0]
# => Array{Float64,2}

Each Array is parameterized by the type of value it holds and by an integer, indicating its rank.

The Billion Dollar Mistake

Julia has avoided what some call The Billion Dollar Mistake: null references. In languages such as Java, null is allowed in place of any object of any type. This allowance has lead to many run-time bugs that, if null didn’t exist, would have been caught at compile time.

Julia has no null and so there’s no way to make this mistake, though some kinds of APIs are harder to express without it.

Generic Functions

All of Julia’s functions are generic, including that Point constructor above. Different methods can be defined for the same function name, but for different types. In Common Lisp and Clojure, generic functions are an opt-in feature, so most functions are not generic.

Note that this is significantly different than function overloading, where the specific function to call is determined at compile time. In multimethods, the method chosen is determined by the run-time type of its arguments. One of Julia’s notable achievements is that its multimethods have very high performance. There’s usually more of a trade-off.

Julia’s operators are functions with special syntax. For example, the + function,

+(3, 4)
# => 7

A big advantage is that operators can be passed around as first-class values.

map(-, [1, 2, 3])
# [-1, -2, -3]

Because all functions are generic, operators can have methods defined for specific types, effectively becoming operator overloading (but better!).

function +(p1::Point, p2::Point)
  return Point(p1.x + p1.y, p2.x + p2.y)
end

Point(1,1) + Point(1, 2)
# => Point{Int64}(2,3)

(Note that to write this method correctly, either Point or the method should probably promote its arguments.)

Foreign Function Interface

Julia has a really slick foreign function interface (FFI). Libraries don’t need to be explicitly loaded and call interfaces don’t have to be declared ahead of time. That’s all taken care of automatically.

I’m not going to dive into the details, but basically all you have to do is indicate the library, the function, the return type, and then pass the arguments.

ccall((:clock, "libc"), Int32, ())
# => 2292761

Generally this would be wrapped up nicely in a regular function and the caller would have no idea an FFI is being used. Unfortunately structs aren’t yet supported.

Julia’s Problems

Not everything is elegant, though. There are some strange design decisions. The two big ones for me are strings and modules.

Confused Strings

Julia has a Char type that represents a Unicode code point. It’s a 32-bit value. So far so good. However, a String is not a sequence of these. A Julia string is a byte-array of UTF-8 encoded characters.

Indexing into a string operates on bytes rather than characters. Attempting to index into the middle of a character results in an error. Yuck!

"naïvety"[4]
# ERROR: invalid UTF-8 character index

I don’t understand why this behavior was chosen. This would make sense if Julia was an old language and this was designed before Unicode was established (e.g. C). But, no, this is a brand new language. There’s no excuse not to get this right the first time. I suspect it has to do with Julia’s FFI.

Clunky, Closed Modules

Julia’s module system looks like it was taken right out of Scheme’s R6RS. This isn’t a good thing.

The module definition that wraps the entire module up in a single syntactic unit. Here’s an example from the documentation. According to the style guide, the body of the module is not indented out.

module MyModule
using Lib
export MyType, foo

type MyType
  x
end

bar(x) = 2x
foo(a::MyType) = bar(a.x) + 1

import Base.show
show(io, a::MyType) = print(io, "MyType $(a.x)")
end

That final end seals the module for good. There’s no opening the module back up to define or redefine new functions or types. If you want to change something you have to reload the entire module, which will obsolete any type instances.

Compare this to Clojure, where the module isn’t wrapped up in a syntactical construct.

(ns my.module
  (require : [clojure.set :refer [rename-keys]]))

Common Lisp’s defpackage also works like this. At any time you can jump into a namespace and make new definitions.

(in-ns 'my.module)

This is absolutely essential to interactive development. The lack of this makes Julia far less dynamic than it should be. Combined with the lack of a printer, Julia is not currently suitable as an interactive interpreter subprocess (Slime, Cider, Skewer, etc.).

This is a real shame, because I’d like to start playing around with Julia, but right now it feels like a chore. It’s needlessly restricted to a C++/Java style workflow.

I’ll probably revisit Julia once it’s had a few more years to mature. Then we’ll see if things have improved enough for real use.

My Grading Process

2013-10-13T02:56:31Z

My GitHub activity, including this blog, has really slowed down for the past month because I’ve spent a lot of free time grading homework for a design patterns class, taught by a colleague at the Whiting School of Engineering. Conveniently for me, all of my interaction with the students is through e-mail. It’s been a great exercise of my new e-mail setup, which itself has definitely made this job easier. It’s kept me very organized through the whole process.

Each assignment involves applying two or three design patterns to a crude (in my opinion) XML parsing library. Students are given a tarball containing the source code for the library, in both Java and C++. They pick a language, modify the code to use the specified patterns, zip/archive up the result, and e-mail me their zipfile/tarball.

It took me the first couple of weeks to work out an efficient grading workflow, and, at this point, I can accurately work my way through most new homework submissions rapidly. On my end I already know the original code base. All I really care about is the student’s changes. In software development this sort of thing is expressed a diff, preferably in the unified diff format. This is called a patch. It describes precisely what was added and removed, and provides a bit of context around each change. The context greatly increases the readability of the patch and, as a bonus, allows it to be applied to a slightly different source. Here’s a part of a patch recently submitted to Elfeed:

diff --git a/tests/elfeed-tests.el b/tests/elfeed-tests.el
index 31d5ad2..fbb78dd 100644
--- a/tests/elfeed-tests.el
+++ b/tests/elfeed-tests.el
@@ -144,15 +144,15 @@
   (with-temp-buffer
     (insert elfeed-test-rss)
     (goto-char (point-min))
-    (should (eq (elfeed-feed-type (xml-parse-region)) :rss)))
+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss)))
   (with-temp-buffer
     (insert elfeed-test-atom)
     (goto-char (point-min))
-    (should (eq (elfeed-feed-type (xml-parse-region)) :atom)))
+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :atom)))
   (with-temp-buffer
     (insert elfeed-test-rss1.0)
     (goto-char (point-min))
-    (should (eq (elfeed-feed-type (xml-parse-region)) :rss1.0))))
+    (should (eq (elfeed-feed-type (elfeed-xml-parse-region)) :rss1.0))))

 (ert-deftest elfeed-entries-from-x ()
   (with-elfeed-test

I’d really prefer to receive patches like this as homework submissions but this is probably too sophisticated for most students. Instead, the first thing I do is create a patch for them from their submission. Most students work off of their previous submission, so I just run diff between their last submission and the current one. While I’ve got a lot of the rest of the process automated with scripts, I unfortunately cannot script patch generation. Each student’s submission follows a unique format for that particular student and some students are not even consistent between their own assignments. About half the students also include generated files alongside the source so I need to clean this up too. Generating the patch is by far the messiest part of the whole process.

I grade almost entirely from the patch. 100% correct submissions are usually only a few hundred lines of patch and I can spot all of the required parts within a few minutes. Very easy. It’s the incorrect submissions that consume most of my time. I have to figure out what they’re doing, determine what they meant to do, and distill that down into discrete discussion items along with point losses. In either case I’ll also add some of my own opinions on their choice of style, though this has no effect on the final grade.

For each student’s submission, I commit to a private Git repository the raw, submitted archive file, the generated patch, and a grade report written in Markdown. After the due date and once all the submitted assignments are graded, I reply to each student with their grade report. On a few occasions there’s been a back and forth clarification dialog that has resulted in the student getting a higher score. (That’s a hint to any students who happen to read this!)

Even ignoring the time it takes to generate a patch, there are still disadvantages to not having students submit patches. One is the size: about 60% of my current e-mail storage, which goes all the way back to 2006, is from this class alone from the past one month. It’s been a lot of bulky attachments. I’ll delete all of the attachments once the semester is over.

Another is that the students are unaware of the amount of changes they make. Some of these patches contain a significant number of trivial changes — breaking long lines in the original source, changing whitespace within lines, etc. If students focused on crafting a tidy patch they might try to avoid including these types of changes in their submissions. I like to imagine this process being similar to submitting a patch to an open source project. Patches should describe a concise set of changes, and messy patches are rejected outright. The Git staging area is all about crafting clean patches like this.

If there was something else I could change it would be to severely clean up the original code base. When compiler warnings are turned on, compiling it emits a giant list of warnings. The students are already starting at an unnecessary disadvantage, missing out on a very valuable feature: because of all the existing noise they can’t effectively use compiler warnings themselves. Any new warnings would be lost in the noise. This has also lead to many of those trivial/unrelated changes: some students are spending time fixing the warnings.

I want to go a lot further than warnings, though. I’d make sure the original code base had absolutely no issues listed by PMD, FindBugs, or Checkstyle (for the Java version, that is). Then I could use all of these static analysis tools on student’s submissions to quickly spot issues. It’s as simple as using my starter build configuration. In fact, I’ve used these tools a number of times in the past to perform detailed code reviews for free (1, 2, 3). Providing an extensive code analysis for each student for each assignment would become a realistic goal.

I’ve expressed all these ideas to the class’s instructor, my colleague, so maybe some things will change in future semesters. If I’m offered the opportunity again — assuming I didn’t screw this semester up already — I’m still unsure if I would want to grade a class again. It’s a lot of work for, optimistically, what amounts to the same pay rate I received as an engineering intern in college. This first experience at grading has been very educational, making me appreciate those who graded my own sloppy assignments in college, and that’s provided value beyond the monetary compensation. Next time around wouldn’t be as educational, so my time could probably be better spent on other activities, even if it’s writing open source software for free.

Moving to Openbox

2012-06-25T00:00:00Z

With my dotfiles repository established I now have a common configuration and environment for Bash, Git, Emacs (separate repository), and even Firefox! This wouldn’t normally be possible because Firefox doesn’t have tidy dotfiles by default, but the wonderful Pentadactyl made it possible. My script sets up keybindings, bookmark keywords, and quickmarks so that my browser feels identical across all my computers. Now that it’s easy to add tweaks, I’m sure I’ll be putting more in there in the future.

However, one major application remained and I was really itching to capture its configuration too, since even my web browser is part of the experience. I could drop my dotfiles into a new computer within minutes and be ready to start hacking, except for my desktop environment. This was still a tedious, manual step, plagued by the configuration propagation issue. I wouldn’t to get too fancy with keybindings since I couldn’t rely on them being everywhere.

The problem was I was using KDE at the time and KDE’s configuration isn’t really version-friendly. Some of it is binary, making it unmergable, it doesn’t play well between different versions, and it’s unclear what needs to be captured and what can be ignored.

I wasn’t exactly a happy KDE user and really felt no attachment to it. I had only been using it a few months. I’ve used a number of desktops since 2004, the main ones being Xfce (couple years), IceWM (couple years), xmonad (8 months), and Gnome 2 (the rest of the time). Gnome 2 was my fallback, the familiar environment where I could feel at home and secure — that is, until Gnome 3 / Unity. The coming of Gnome 3 marked the death of Gnome 2. It became harder and harder to obtain version 2 and I lost my fallback.

I gave Gnome 3 and Unity each a couple of weeks but I just couldn’t stand them. Unremovable mouse hotspots, all new alt-tab behavior, regular crashing (after restoring old alt-tab behavior), and extreme unconfigurability even with a third-party tweak tool. I jumped for KDE 4, hoping to establish a comfortable fallback for myself.

KDE is pretty and configurable enough for me to get work done. There’s a lot of bloat (“activities” and widgets), but I can safely ignore it. The areas where it’s lacking didn’t bother me much, like the inability/non-triviality of custom application launchers.

My short time with Gnome 3 and now with KDE 4 did herald a new, good change to my habits: keyboard application launching. I got used to using the application menu to type my application name and launch it. I did use dmenu during my xmonad trial, but I didn’t quite make a habit out of it. It was also on a slower computer, slow enough for dmenu to be a problem. For years I was just launching things from a terminal. However, the Gnome and KDE menus both have a big common annoyance. If you want to add a custom item, you need to write a special desktop file and save it to the right location. Bleh! dmenu works right off your PATH — the way it should work — so no special work needed.

Gnome 2 has been revived with a fork called MATE, but with the lack of a modern application launcher, I’m now too spoiled to be interested. Plus I wanted to find a suitable environment that I could integrate with my dotfiles repository.

After being a little embarrassed at Luke’s latest Show Me Your Desktop (what kind of self-respecting Linux geek uses a heavyweight desktop?!) I shopped around for a clean desktop environment with a configuration that would version properly. Perhaps I might find that perfect desktop environment I’ve been looking for all these years, if it even exists. It wasn’t too long before I ended up in Openbox. I’m pleased to report that I’m exceptionally happy with it.

Its configuration is two XML files and a shell script. The XML can be generated by a GUI configuration editor and/or edited by hand. The GUI was nice for quickly seeing what Openbox could do when I first logged into it, so I did use it once and find it useful. The configuration is very flexible too! I created keyboard bindings to slosh windows around the screen, resize them, move them across desktops, maximize in only one direction, change focus in a direction, and launch specific applications (for example super-n launches a new terminal window). It’s like the perfect combination of tiling and stacking window managers. Not only is it more configurable than KDE, but it’s done cleanly.

Openbox is pretty close to the perfect environment I want. There are still some annoying little bugs, mostly related to window positioning, but they’ve mostly been fixed. The problem is that they haven’t made an official release for a year and a half, so these fixes aren’t yet available. I might normally think to myself, “Why haven’t I been using Openbox for years?” but I know better than that. Versions of Openbox from just two years ago, like the one in Debian Squeeze (the current stable), aren’t very good. So I haven’t actually been missing out on anything. This is something really new.

I’m not using a desktop environment on top of Openbox, so there are no panels or any of the normal stuff. This is perfectly fine for me; I have better things to spend that real estate on. I am using a window composite manager called xcompmgr to make things pretty through proper transparency and subtle drop shadows. Without panels, there were a couple problems to deal with. I was used to my desktop environment performing removable drive mounting and wireless network management for me, so I needed to find standalone applications to do the job.

Removable filesystems can be mounted the old fashioned way, where I create a mount point, find the device name, then mount the device on the mount point as root. This is annoying and unacceptable after experiencing automounting for years. I found two applications to do this: Thunar, Xfce’s file manager; and pmount, a somewhat-buggy command-line tool.

I chose Wicd to do network management. It has both a GTK client and an ncurses client, so I can easily manage my wireless network connectivity with and without a graphical environment — something I could have used for years now (goodbye iwconfig)! Unfortunately Wicd is rigidly inflexible, allowing only one network interface to be up at a time. This is a problem when I want to be on both a wired and wireless network at the same time. For example, sometimes I use my laptop as a gateway between a wired and wireless network. In these cases I need to shut down Wicd and go back to manual networking for awhile.

The next issue was wallpapers. I’ve always liked having natural landscape wallpapers. So far, I could move onto a new computer and have everything functionally working, but I’d have a blank gray background. KDE 4 got me used to slideshow wallpaper, changing the landscape image to a new one every 10-ish minutes. For a few years now, I’ve made a habit of creating a .wallpapers directory in my home directory and dumping interesting wallpapers in there as I come across them. When picking a new wallpaper, or telling KDE where to look for random wallpapers, I’d grab one from there. I’ve decided to continue this with my dotfiles repository.

I wrote a shell script that uses feh to randomly set the root (wallpaper) image every 10 minutes. It gets installed in .wallpapers from the dotfiles repository. Openbox runs this script in the background when it starts. I don’t actually store the hundreds of images in my repository. There’s a fetch.sh that grabs them all from Amazon S3 automatically. This is just another small step I take after running the dotfiles install script. Any new images I throw in .wallpaper get put int the rotation, but only for that computer.

I’ve now got all this encoded into my configuration files and checked into my dotfiles repository. It’s incredibly satisfying to have this in common across each of my computers and to have it instantly available on any new installs. I’m that much closer to having the ideal (and ultimately unattainable) computing experience!

Why Do Developers Prefer Certain Kinds of Tools?

2012-04-29T00:00:00Z

In my experience, software developers generally prefer some flavor of programmer’s tools when it comes to getting things done. We like plain text, text editors, command line programs, source control, markup, and shells. In contrast, non-developer computer users generally prefer WYSIWYG word processors and GUIs. Developers often have somewhere between a distaste and a revulsion to WYSIWYG editors.

Why is this? What are programmers looking for that other users aren’t? What I believe it really comes down to is one simple idea: clean state transformations. I’m talking about modifying data, text or binary, in a precise manner with the possibility of verifying the modification for correctness in the future.

Think of a file produced by a word processor. It may be some proprietary format, like a Word’s old .doc format, or, more likely as we move into the future, it’s in some bloated XML format that’s dumped into a .zip file. In either case, it’s a blob of data that requires a complex word processor to view and manipulate. It’s opaque to source control, so even merging documents requires a capable, full word processor.

For example, say you’ve received such a document from a colleague by e-mail, for editing. You’ve read it over and think it looks good, except you want to italicize a few words in the document. To do that, you open up the document in a word processor and go through looking for the words you want to modify. When you’re done you click save.

The problem is did you accidentally make any other changes? Maybe you had to reply to an email while you were in the middle of it and you accidentally typed an extra letter into the document. It would be easy to miss and you’re probably not set up to easily to check what changes you’ve made.

I am aware that modern word processors have a feature that can show changes made, which can then be committed to the document. This is really crude compared to a good source control management system. Due to the nature of WYSIWYG, you’re still not seeing all of the changes. There could be invisible markup changes and there’s no way to know. It’s an example of a single program trying to do too many unrelated things, so that it ends up do many things poorly.

With source code, the idea of patches come up frequently. The program diff, given two text files, can produce a patch file describing their differences. The complimentary program is patch, which can take the output from diff and one of the original files, and use it to produce the other file. As an example, say you have this source file example.c,

int main()
{
    printf("Hello, world.");
    return 0;
}

If you change the string and save it as a different file, then run diff -u (-u for unified, producing a diff with extra context), you get this output,

--- example.c  2012-04-29 21:50:00.250249543 -0400
+++ example2.c   2012-04-29 21:50:09.514206233 -0400
@@ -1,5 +1,5 @@
 int main()
 {
+    printf("Hello, world.");
-    printf("Goodbye, world.");
     return 0;
 }

This is very human readable. It states what two files are being compared, where they differ, some context around the difference (beginning with a space), and shows which lines were removed (beginning with + and -). A diff like this is capable of describing any number of files and changes in a row, so it can all fit comfortably in a single patch file.

If you made changes to a codebase and calculated a diff, you could send the patch (the diff) to other people with the same codebase and they could use it to reproduce your exact changes. By looking at it, they know exactly what changed, so it’s not some mystery to them. This patch is a clean transformation from one source code state to another.

More than that: you can send it to people with a similar, but not exactly identical, codebase and they could still likely apply your changes. This process is really what source control is all about: an easy way to coordinate and track patches from many people. A good version history is going to be a tidy set of patches that take the source code in its original form and add a feature or fix a bug through a series of concise changes.

On a side note, you could efficiently store a series of changes to a file by storing the original document along with a series of relatively small patches. This is called delta encoding. This is how both source control and video codecs usually store data on disk.

Anytime I’m outside of this world of precision I start to get nervous. I feel sloppy and become distrustful of my tools, because I generally can’t verify that they’re doing what I think they’re doing. This applies not just to source code, but also writing. I’m typing this article in Emacs and when I’m done I’ll commit it to Git. If I make any corrections, I’ll verify that my changes are what I wanted them to be (via Magit) before committing and publishing them.

One of my longterm goals with my work is to try to do as much as possible with my precision developer tools. I’ve already got basic video editing and GIF creation worked out. I’m still working out a happy process for documents (i.e. LaTeX and friends) and presentations.

The Great Tab Mistake

2011-01-12T00:00:00Z

I'm a window manager vagrant, wondering from window manager to window manager over the years. When I started using GNU/Linux about 7 years ago, I was using KDE. Years later, Gnome, then Xfce, a short time with FVWM, a couple years of IceWM. I spent most of 2010 with xmonad. And I recently started using Fluxbox.

Now that I've learned how to make effective use of Fluxbox I can't help but think that a disheartening mistake was made years ago when the window manager concepts were established. Just like title bars and the iconic minimize, maximize, and close buttons, I think tabs should have been a staple of all window managers, with a common accompanying API. This is the Great Tab Mistake.

Tabs are now an important feature of browsers, and is probably where tabs have the most recognition among casual computer users. You'll also find them in some terminal emulators, text editors, spreadsheets, and word processors.

I don't really know much of the true history of tabs, but this is my guess. Tabs really made their big debut in the browser wars of the 1990's. The web browsers of the day competed for market share, two of them in particular. Just as you'd expect many kinds of competing software, they differentiated themselves on features (which actually still causes problems to this day).

One major feature to pop up was tabbed browsing. Exploring the hyperlink graph of the web from multiple positions at once became a lot cheaper with tabs. The only way to do it otherwise would be to open multiple windows, which required interacting with the frequently clumsy windowing systems (i.e. alt-tabbing). Even though the web is an arbitrary directed graph, in practice we explore its hierarchical sub-graph tree.

With only one page open at a time, there's a lot of backing in and out to explore the full hierarchy. It's more of a depth-first search. Easy to get lost.

With multiple pages open in different tabs, we have a hold of several points of the hierarchy at once. We can advance further down each part of the hierarchy, creating more page instances as needed, at leisure. It's more of a breadth first search.

However, the early tab paradigm was flawed, which may have party caused the Great Tab Mistake. Though this flaw was has recently begun to be corrected with newer browsers like Chromium. The tabs are meant to analogous to the tabs in a real-world booklet. The tab hangs off the end and switching tabs changes the entire page. Those early tabs were placed below elements that were tied to the page, particularly the address bar. Changing tabs changed content above and below the tab.

These tabs were too low! They should have been higher, just below the title bar, if not part of the title bar itself. This brings me back to my point: tabs are really a feature of the window manager, because it's really a way to manage separate, but associated, windows!

Why am I only realizing this now? One of the features of Fluxbox is tabs as part of the window manager. Any window's title bar can be dragged into another window's title bar and they'll be combined as a single tabbed window. Key bindings can be assigned to move between tabs in the same way as a browser. If Firefox had hooks into this system that would allow me to substitute window manager tabs in place of its tabs, that's definitely what I'd be doing. Instead, I've got two different tab systems serving the same purpose.

Why have everyone re-invent the wheel, each with their own quirks and configuration, that is redundant with a system that already exists solely for that purpose? It's a mistake!

Java is Death By A Thousand Paper Cuts

2010-08-13T00:00:00Z

At least it is for me. This past week at work I've been furiously rushing work on a project written in Java. It's completely my fault that I'm using Java since I'm the one who picked the language. I wanted to use Java the platform, but I don't know any other JVM languages well enough to use them in place of Java the language for a project at work.

It's all sorts of little things that make writing Java so exhausting for me. At the end of the day it I just feel irritated. I hate the absolute lack of functional programming and that I have to specify everything at such a low level. The whole reason we program in high-level languages is so we can express algorithms more concisely, but Java fails at this.

Here's an example of what I'm talking about, something I basically did a few times today. Let's say you have an array of floats, nums and you want to sum them and return the result (or maybe use it in another expression). In Lisp it's very straightforward.

(reduce '+ nums)

"Reduce the sequence nums by addition." Notice that it's more about saying what I want to do rather than how to do it. I don't have to introduce any temporary storage or iterators. To do the same thing in Java it will look something like this.

int sum = 0;
for (double num : nums) {
    sum += num;
}
return sum;

If you're using an older Java without the enhanced looping construct it gets even uglier. I had to introduce a variable for accumulation and a second variable for iteration. This sort of thing has to be done all over the place in Java, and it greatly increases the cognitive overload when reading Java code.

This instruction is more about telling the computer the how rather than my overall intention. One problem with telling it the how is that I've unnecessarily locked in a specific algorithm and ordering. The literal instruction says that the numbers must be added in order, sequentially. My Lisp instruction doesn't do that.

It gets even worse when you complicate it slightly by adding a second array and, say, multiplying it pairwise with the first.

for (int i = 0; i < nums1.length; i++) {
    sum += nums1[i] * nums2[i];
}
return sum;

Now the loop gets more complex. I have to tell it how to increment the iterator. I have to tell it to check the bounds of the array. The iterator is a misdirection because the actual number stored in it isn't what's important. Again, the Lisp method is much more concise.

(reduce '+ (map 'list '* nums1 nums2))

"Map the two sequences by multiplication into a list, then reduce it by addition." Unfortunately we start to leak a little bit into the how here. I am telling it that the intermediate structure should be a list, because map forces me to pick a representation. Besides that, I am only describing my overall intention and not the obvious details.

So with Java my days become filled with the tedious low-level algorithm descriptions that I have to hammer out over and over and over. Death by a thousand paper cuts.

Lisp isn't the only language that has a (generally) much better approach; it's just my favorite. :-) Most languages with at least some decent functional facilities will also do the above concisely.

Pen and Paper RPG Wishlist

2010-07-20T00:00:00Z

As I get more involved with tabletop RPGs, specifically Dungeons and Dragons, I find there are some related attributes that I wish these game systems had. While I'm sure there are systems do have some of these, I wish whatever I happen to be using had all of them.

Print friendly. The source material tends to be very colorful and graphical. While this can be a good thing, especially when illustrating monsters (Show, not tell!), it's bad if you want to print out your own materials. I want the crucial information available in a crisp, clean monochrome form of some sort. Not only could I reproduce material for use in notes and handouts, but I could create my own condensed sets of information by composing these crisp forms.

For example, in the D&D monster manual each monster has a nice concise block containing all the information — defenses, health, abilities, etc. — needed to use that monster. This is great, but it's on a brownish background, in a red-ish box. So close to being what I want. But even then, do I have legal permission to reproduce this information? And so ...

Licensing. The closest thing tabletop gaming has to a Free Software license would be the Open Game License (OGL), which is still pretty restrictive. I would love for the source materials to be licensed at least loosely enough that I could print out my own copies for cheap (assuming they are print friendly, per above). Have some new players sitting down at the table? To get them started, give them that stapled-together player handbook you printed out. There's RPG evangelism for you.

The Fudge role-playing game system has both these attributes down pretty well. The Fudge manual is very print friendly PDF with explicit permission to share it with your friends. However, just as yacc is a compiler compiler, Fudge is really a game system system, a system for creating game systems, so it's only part of what is needed to play a game.

Useful software tools. One specific example is character creation software. Creating a new character can be burdensome, especially for a new player. Software that allows a player to select some basic options from a menu and produce a printable, error-free character sheet can save a lot of time.

Fourth edition Dungeons and Dragons has a character builder, but it is a humongous piece of junk. It's proprietary, Windows-only, bulky, and slow. For a program that merely generates printouts based on a few user selections from some simple menus, it has some extremely excessive system requirements (much higher than the ones they claim). And it requires a reboot to install too. A human can produce the same results by hand inside of a half hour, so for a computer there is virtually no computation involved. So what is it doing? Worse of all, the fourth edition license expressly forbids competing character creation software, so no one can legally produce a reasonable one. All this thing should be is a database of available character abilities, some character sheet logic, and a postscript printer.

Fortunately there are some decent, generic world generation tools for GMs out there, such as random inn generators, random dungeon generators, and so on. And another one. I've mentioned this before.

If you know any systems that fit the above descriptions well, go ahead and link them in the comments!

The Problem with String Stored Regex

2010-04-23T00:00:00Z

While regular expressions have limited usefulness, especially in larger programs, they're still very handy to have from time to time. It's usually difficult to write a lexer or tokenizer without one. Because of this several languages build them right into the language itself, rather than tacked on as a library. It allows the regular expressions to be stored literally in the code, treated as its own type, rather than inside a string. The problem with storing a regular expression inside a string is that it can easily make an already complex regular expression much more complex. This is because there are two levels of parsing going on.

Consider this regular expression where we match an alphanumeric word inside of quotes. I'm going to use slashes to delimit the regular expression itself.

/"\w+"/

Notice there is no escaping going on. The backslash is there is indicate a special sequence \w, which is equal to [a-zA-Z0-9_]. This will get parsed and compiled into some form in memory before it is run by a program. If the language doesn't directly support regular expressions then we usually can't put it in the code as is, since the language parser won't know how to deal with it. The solution is to store it inside of a string.

However, our regular expression contains quotes and these will need to be escaped when in a quote delimited string. But I no longer need slashes to delimit my regular expression.

"\"\w+\""

Did you notice the error yet? If not, stop and think about it for a minute. Our special sequence \w will not make it intact to the regular expression compiler. That backslash will escape the w during the string parsing step, leaving only the w. The string we typed will get parsed into a series of characters in memory, performing escapes along the way, and then that sequence will be handed to the regular expression compiler. So we have to fix it,

"\"\\w+\""

That's getting hard to understand, compared to the original. Now let's throw a curve-ball into this: let's match a backslash at the beginning of the word. The normal regular expression looks like this now,

/"\\\w+"/

We have to escape our backslash to make it a literal backslash, so it takes two of them. Now, when we want to do this in a string-stored regular expression we have to escape both of those backslashes again. It looks like this,

"\"\\\\\\w+\""

Now to match a single backslash we have to insert four backslashes! Quite unfortunately, Emacs Lisp doesn't directly support regular expressions even though the language has a lot of emphasis on text parsing, so a lot of Elisp code is riddled with this sort of thing. Elisp is especially difficult because sometimes, such as during prompts, you can enter a regular expression directly and can ignore the layer of string parsing. It's a very conscious effort to remember which situation you're in at different times.

Perl, Ruby, and JavaScript have regular expressions as part of the language and it makes a lot of sense for these languages; they tend to do a lot of text parsing. Python does it partially, with its r' syntax. Any string preceded with an r loses its escape rules, but it also means you can't match both single or double quotes without falling back to a normal string with escaping. Common Lisp may be able to do it with a reader macro, but I've never seen it done.

Remember those two levels of parsing when writing string stored regex. It helps avoid hair-pulling annoying mistakes.

Your BitTorrent Client is Probably Defective by Design

2009-10-26T00:00:00Z

Your BitTorrent client probably has DRM in it, even if it's Free software. Torrent files (.torrent) may contain a special, but unofficial, "private" flag. When it does, clients are supposed to disable decentralized tracking features, regardless of the user's desires. This is a direct analog to the copy-prevention flag that PDFs may have set, which tell PDF viewers to cripple themselves, except that your Free PDF reader is actually more likely to ignore it.

It's impossible to simply open the torrent file and turn off the flag. The client has to be modified, fixing the purposeful defect, to ignore it. Note, simpler clients that don't have these features in the first place don't have this problem, since they don't have any features to disable.

The private flag exists because modern BitTorrent trackers can function without a central tracker. If the central tracker is down, or if the user doesn't want to use it, the client can fetch a list of peers in the torrent from a worldwide distributed hash table. It's one big decentralized BitTorrent tracker (though any arbitrary data can be inserted into it). Clients also have the ability to tell each other about peers when they are doing their normal data exchange. Thanks to this, clients can transcend central trackers and join the larger global torrent of peers. It makes for healthier torrents.

Anyone who knows a few peers involved with a torrent can join in, regardless of their ability to talk to the central tracker. But private tracker sites don't want their torrents to be available outside to those outside of their control, so they proposed an addition to the BitTorrent spec for a "private" flag. Clients with decentralized capabilities are advised cripple that ability when the flag is on, so no peer lists will leak outside the private tracker. This flag was never accepted into the official spec, and I hope it never is.

Unfortunately the private trackers set an ultimatum: obey the private flag or find your client banned. The client developers fell in line and, and as far as I am aware, no publicly available clients will use decentralized tracking while the flag is on. At one point, the BitComet client ignored the flag and was banned for some time until it was "fixed".

The private flag wasn't placed in front with the rest of the metadata where it belonged. It's intentionally placed at the end of the torrent file inside of the info section. This means that the flag is part of the info_hash property of the torrent, which is the global identifier for the torrent. Unset or remove the private flag and the hash changes, creating a whole new torrent without any seeds.

This is DRM, an artificial restriction imposed on the user. It's insulting. Users should be the ones that control what happens with their computers. The reasonable approach to a private flag is that, when the private flag is enabled, decentralized tracking is turned off by default, but can be re-enabled by the user should they desire. That way the desired behavior is indicated but the user has the final say, not some unrelated website operator.

I rarely use private trackers, since they are nearly pointless, but I still find this private flag set on public torrents, probably from someone simply reposting the torrent file from a private site. It's annoying to run into. It makes the torrents weaker.

Debian, which is my distribution of choice, is generally good about removing DRM from the software it distributes. For example, the PDF readers in the repositories have their DRM disabled (i.e. xpdf). So why not do the same thing for all the intentionally defective BitTorrent clients?

I went on the Debian IRC channel and brought up the issue only to find out that everyone thought a little DRM was reasonable. So then I filed a bug report on it, which was simply closed citing that the DRM is a beneficial "feature" and that removing the intentional defect would make the clients "poorer". They also insisted that it's part of the spec when it's not. I'm really disappointed in Debian now.

Now, I could modify a client to ignore the flag, but it's not useful if I am the only one not running DRM. It takes two to tango. A client used by many people would have to be fixed before it becomes beneficial.

So when someone asks for an example of Free Software or Open Source software with DRM in it, you can point to BitTorrent clients.

Web Pages Are Liquids

2009-08-05T00:00:00Z

Update November 2011: I've since spent a lot more time with widescreen monitors, and the web has changed a bit, so I somewhat changed my mind about this topics, as you can see by the page around you.

Web pages aren't a static medium, like books, brochures, or pamphlets. The web is not print. Accordingly, the layout of web pages should not be locked to some static width, but instead flow to fill the width of the browser like a liquid. Web pages should normally have a liquid layout.

One of the most obvious problems with the fixed layout occurs when the browser window is stretched wider than the designer had intended.

I, as a user, have little control my viewing of the website. I'm stuck reading through a keyhole. It gets much worse if the browser isn't as wide as the designer intended: a horizontal scrollbar appears and navigation becomes very difficult. My laptop runs at a resolution of 1024x768, and I frequently come across pages where this is an issue. And according to Jakob Nielsen, in 2006 77% of user's screens were 1024 pixels wide or less.

See the liquid for yourself right here: adjust the width of your browser and watch this text flow to fill the screen. You can also bring it in pretty far before you clip an image and the horizontal scrollbar appears. The exact width depends only on the widest image being displayed. This also comes into play if you adjust the font size.

Using a liquid layout allows the page to work well with a wide variety of screen widths, and most importantly, gives users lots of control over how they view the site. It's very unfortunate that (in my experience) most websites employ a poor, fixed layout. Even web design "expert" websites will ironically hand out web design tips from within these annoying confines. One of the biggest culprits driving this is Wordpress, which has this flawed layout by default.

The very worst offenders tend to be websites with little actual content, like corporate websites or "artist" portfolios. The less usable the page, the less I wanted to be there anyway.

So please drop the fancy, low-usability web designs for something with much better usability. Your users will probably appreciate it.

Ad-blocking and the Regrettable URL Format

2009-08-02T00:00:00Z

I use Adblock Plus to block advertisements and, more importantly, invisible privacy-breaking trackers (most people aren't even aware of these). I think ad-blocking is actually easier than ever, because ads are served from a relatively small number of domains, rather than from the websites themselves. Instead of patterns matching parts of a path, I can just block domains.

Adblock Plus emphasizes this by providing, by default, a pattern matching the server root. Example,

http://ads.example.com/*

But sometimes advertising websites are trickier, and their sub-domain is a fairly unique string,

http://ldp38fm.example.com/*

That pattern isn't very useful. I want something more like,

http://*.example.com/*

Unfortunately Adblock Plus doesn't provide this pattern automatically yet, so I have to do it manually. I think this pattern is less obvious because the URL format is actually broken. Notice have have two matching globs (*) rather than just one, even though I am simply blocking everything under a certain level.

Tim Berners-Lee regrets the format of the URL, and I agree with him. This is what URLs like http://ads.example.com/spy-tracker.js should look like,

http://com/example/ads/spy-tracker.js

It's a single coherent hierarchy with each level in order. This makes so much more sense! If I wanted to block example.com and all it's sub-domains, the pattern is much simpler and less error prone,

http://com/example/*

To anyone who ever reinvents the web: please get it right next time.

Update: There is significant further discussion in the comments.

Television Commercials

2009-07-28T00:00:00Z

First, let me note that I don't watch television. At least not in the sense of sitting on the couch, turning it on, and flipping through the stations. I can't stand the compressed audio, the constant, loud commercial interruptions, and general lack of control over my viewing. VCRs, and more recently PVRs, have mitigated these last two points, but not enough to grab my interest.

The way I see it, there are four ways to access television. Here is the matrix,

For an "acceptable" situation we have cost-free television, but with advertising, in broadcast and streaming television. And in the opposite "acceptable" situation we have ad-free television, but with a monthly fee, in premium television. I think these two are acceptable compromises. Someone else can foot the bill, or you can foot the bill.

In a few cases, such as viewer-supported television like PBS, it's both cost-free and ad-free. This is pretty nice. You can have your cake and eat it too.

However, most television is only legitimately available in the worst case situation! Not only do you have to pay to access it, but one-third of it is annoying, unwanted advertising. This is awful, and it is one reason why I choose not to participate.

Luckily, there is another "best case" option which provides quick access to most television shows of the world: peer-to-peer file-sharing. Unfortunately, it doesn't include live television, and it's usually not quite legal. We have the technology to distribute large amounts of data to huge numbers of people at practically no cost, but a bunch of old, out-of-date laws stand in the way. It's a shame. I think this quote by "muuh-gnu" sums it up well,

We have 2009. Everybody and their dog has a computer, which is designed to copy stuff. Also we have broadband which is, again, designed to ... move stuff around the world. So is what you're actually pointlessly advocating is that we collectively should ... actually what? Abstain from using a common technology in order to make absurdly archaic 50's business models of "manufacturing and selling single copies" viable in day and age when everybody can manufacture and distribute those copies themselves?

It's a good thing some bad laws don't get in the way of progress too much.

Elisp Wishlist

2009-05-29T00:00:00Z

Update: It looks like all these wishes, except the last one, may actually be coming true! Guile can run Elisp better than Emacs! The idea is that the Elisp engine is replaced with Guile — the GNU project's Scheme implementation designed to be used as an extension language — and written in Scheme is an Elisp compiler that targets Guile's VM. The extension language of Emacs then becomes Scheme, but Emacs is still able to run all the old Elisp code. At the same time Elisp itself, which I'm sure many people will continue to use, gets an upgrade of arbitrary precision, closures, and better performance.

I've been using elisp a lot lately, but unfortunately it's missing a lot of features that one would find in a more standard lisp. The following are some features I wish elisp had. Many of these could be fit into a generic "be more like Scheme or Common Lisp". Some of these features would break the existing mountain of elisp code out there, requiring a massive rewrite, which is likely the main reason they are being held back.

Closures, and maybe continuations. Closures are one of the features I miss the most when writing elisp. They would allow the implementation of Scheme-style lazy evaluation with delay and force, among other neat tools. Continuations would just be a neat thing to have, though they come with a performance penalty.

Closures would also pretty much require Emacs switch to lexical scoping.

Arbitrary precision. Really, any higher order language's numbers should be bignums. Emacs 22 does come with the Calc package which provides arbitrary precision via defmath. Perl does something like this with the bignum module.

Packages/namespaces. Without namespaces all of the Emacs packages prefix their functions and variables with its name (i.e. dired-). Some real namespaces would be useful for large projects.

C interface. This is something GNU Emacs will never have because Richard Stallman considers Emacs shared libraries support to be a GPL threat. If Emacs could be dynamically extended some useful libraries could be linked in and exposed to elisp.

Concurrency. If some elisp is being executed Emacs will lock up. This is a particular problem for Gnus. Again, Emacs would really need to switch to lexical scoping before this could happen. Threading would be nice.

Speed. Emacs lisp is pretty slow, even when compiled. Lexical scoping would help with performance (compile time vs. run time binding).

Regex type. I mention this last because I think this would be really cool, and I am not aware of any other lisps that do it. Emacs does regular expressions with strings, which is silly and cumbersome. Backslashes need extra escaping, for example. Instead, I would rather have a regex type like Perl and Javascript have. So instead of,

(string-match "\\w[0-9]+" "foo525")

we have,

(string-match /\w[0-9]+/ "foo525")

Naturally there would be a regexpp predicate for checking its type. There could also be a function for compiling a regexp from a string into a regexp object. As a bonus, I would also like to use it directly as a function,

(/\w[0-9]+/ "foo525")

I think a regexp price would really give elisp an edge, and would be entirely appropriate for a text editor. It could also be done without breaking anything (keep string-style regexp support).

There is more commentary over at EmacsWiki: Why Does Elisp Suck.

URL Shortening

2009-04-16T00:00:00Z

There has been a lot of talk online about the fragility of URL shortening services, particularly in relation to Twitter and its 140 character limit on posts (based on SMS limits). These services create a single point of failure and break mechanisms of the web that we rely on. Several solutions have been proposed, so over the next couple years we get to see which ones end up getting adopted.

There are many different URL shortening services out there. They take a large URL, generate a short URL, and store the pair in a database. Several of these services have already shut down in response to abuse by spammers who hide fraudulent URLs behind shortened ones. If these services ever went down all at once, these shortened URLs would rot, destroying many of the connections that make up the world wide web. This is called the rot link apocalypse, and it has some people worried.

I am not very worried about this, though. I don't use Twitter, or any other service that puts such ridiculous restrictions on message sizes. Nor do I think information on Twitter is very important. Also, this mass link rot will occur gradually, slow enough to be dealt with.

In any case, short URLs may be useful sometimes, especially if a URL needs to be memorized or if the URL is extremely long. Or, it could be used to get around a design flaw in an inferior browser.

One idea that I have not yet seen implemented is simple data compression. When a short URL is needed, a user can apply a compression algorithm to the URL. The original URL can be recovered from this alone, so we don't have to rely on third parties to store any data.

I have doubts this would work in practice, though. Generic compression algorithms cannot compress such a small amount of data because their overhead is too large in relation. Go ahead, try pushing a URL through gzip. It will only get longer. We would need a special URL compression algorithm.

For example, I could harvest a large number of URLs from around the web, probably sticking to a single language, and use it to make a Huffman coding frequency table. Then I use this to break URLs into symbols to encode. The ".com/" symbol would likely be mapped to one or two bits. Finally, this compressed URL is encoded in base 64 for use. The client, who already has the same URL frequency table, would use it to decode the URL.

URLs don't seem to have too many common bits, so I doubt this would work well. I should give it a shot to see how well it works.

We probably need to stick with lookup tables mapping short strings to long strings. Instead of using a third party, which can disappear with the valuable data, we do the URL shortening at the same location as the data. If the URL shortening mechanism disappears, so did the data. The URL shortening loss wouldn't matter thanks to this coupling. Getting the shortened URL to users can be tricky, though.

One proposal wants to set the rev attribute of the link tag to "canonical" and point to the short URL.

 rev="canonical" href="http://example.com/FbVT">

To understand this one must first understand the rel attribute. rel defines how the linked URL is related to the current document. rev is the opposite, describing how the current page is related to the linked page. To say rev="canonical" means "I am the canonical URL for this page".

However, I don't think this will get far. Several search engines, including Google, have already adopted a rel="canonical" for regular use. It's meant to be placed with the short URL and will cause search engines to treat it as if it was a 301 redirect. This won't help someone find the short URL from the long URL, though. It is also likely to be confused with the rev attribute by webmasters.

The rev attribute is also considered too difficult to understand, which is why it was removed from HTML5.

Another idea rests in just using the rel attribute by setting it to various values: "short", "shorter", "shortlink", "alternate shorter", "shorturi", "shortcut", "short_url". This website does a good job of describing why they are all not very good (misleading, ugly, or wrong), and it goes on to recommend "shorturl".

~~I went with this last one and added a "short permalink" link in all of my posts.~~ (Removed after changing web hosts.) This points to a 28 letter link that will 301 direct to the canonical post URL. In order to avoid trashing my root namespace, all of the short URLs begin with an asterisk. The 4 letter short code is derived from the post's internal name.

I also took the time to make a long version of the URL that is more descriptive. It contains the title of the post in the URL so a user has an idea of the destination topic before following through. The title is actually complete fluff and simply ignored. Naturally this link's rel attribute is set to "longurl".

Keep your eyes open to see where this URL shortening stuff ends up going.

Avoid Zip Archives

2009-03-22T00:00:00Z

In a previous post about the LZMA compression algorithm, I made a negative comment about zip archives and moved on. I would like to go into more detail about it now.

A zip archive serves three functions all-in-one: compression, archive, and encryption. On a unix-like system, these functions would normally provided by three separate tools, like tar, gzip/bzip2, and GnuPG. The unix philosophy says to "write programs that do one thing and do it well".

So in the case of zip archives, we are doing three things poorly when, instead, we should be using three separate tools that each do one thing well.

When we use three different tools, our encrypted archive is a lot like an onion. On the outside we have encryption. After we peel that off by decrypting it, we have compression, and after removing that lair, finally the archive. This is reflected in the filename: .tar.gz.gpg. As a side note, if GPG didn't already support it, we could add base-64 encoding if needed as another layer on the onion: .tar.gz.gpg.b64.

By using separate tools, we can also swap different tools in and out without breaking any spec. Previously I mentioned using LZMA, which could be used in place of gzip or bzip2. Instead of .tar.gz.gpg you can have .tar.lzma.gpg. Or you can swap out GPG for encryption and use, say, CipherSaber as .tar.lzma.cs2. If we use a single one-size-fits-all format, we are limited by the spec.

Compression

Both zip and gzip basically use the same compression algorithm. The zip spec actually allows for a variety of other compression algorithms, but you cannot rely on other tools to support them.

Zip archives are also inside out. Instead of solid compression, which is what happens in tarballs, each file is compressed individually. Redundancy between different files cannot be exploited. The equivalent would be an inside out tarball: .gz.tar. This would be produced by first individually gzipping each file in a directory tree, then archiving them with tar. This results in larger archive sizes.

However, there is an advantage to inside out archives: random access. We can access a file in the middle of the archive without having to take the whole thing apart. In general use, this sort of thing isn't really needed, and solid compression would be more useful.

Encryption

Encryption is where zip has been awful in the past. The original spec's encryption algorithm had serious flaws and no one should even consider using them today.

Since then, AES encryption has been worked into the standard and implemented differently by different tools. Unless the same zip tool is used on each end, you can't be sure AES encryption will work.

By placing encryption as part of the file spec, each tool has to implement its own encryption, probably leaving out considerations like using secure memory. These tools are concentrating on archiving and compression, and so encryption will likely not be given a solid effort.

In the implementations I know of, the archive index isn't encrypted, so someone could open it up and see lots of file metadata, including filenames.

When you encrypt a tarball with GnuPG, you have all the flexibility of PGP available. Asymmetric encryption, web of trust, multiple strong encryption algorithms, digital signatures, strong key management, etc. It would be unreasonable for an archive format to have this kind of thing built in.

Conclusion

You are almost always better off using a tarball rather than a zip archive. Unfortunately the receiver of an archive will often be unable to open anything else, so you may have no choice.

Don't Write Your Own E-mail Validator

2008-12-24T00:00:00Z

Gmail has a nice feature: when delivering e-mail, everything including and after a + in a Gmail address is ignored. For example, mail arriving at all of these addresses would go to the same place if they were Gmail addresses,

account@example.com
account+nullprogram@example.com
account+slashdot@example.com

Thanks to this feature, when a user acquires a Gmail account, Google is actually providing about a googol (as in the number 10¹⁰⁰) different e-mail addresses to that user! Quite appropriate, really.

I have seen other mailers do similar things, like ignoring everything after dashes. A nice advantage to this is when registering at a new website I can customize my e-mail address for them by, say, throwing the website name in it. Because I have a google of e-mail addresses available, it is impossible to run out, so I can give every person I meet their own version of my address. The custom address can come in handy for sorting and filtering, and it will also tell me who is selling out my e-mail address. This, of course, assumes that someone isn't stripping out the extra text in my address to counter the Gmail feature.

However, in my personal experience, most websites will not permit +'s in addresses. This is completely ridiculous, because it means that virtually every website will incorrectly invalidate perfectly valid e-mail addresses. Even major websites, like coca-cola.com, screw this up. They see the + in the address and give up.

In fact, if I do a Google search for "email validation regex" right now, 9 of the first 10 results return websites with regular expressions that are complete garbage and will toss out many common, valid addresses. The only useful result was at the fifth spot (linked below).

For the love of Stallman's beard, stop writing your own e-mail address validators!

Why shouldn't you even bother writing your own? Because the proper Perl regular expression for RFC822 is over 6 kilobytes in length! Follow that link and look at that. This is the smallest regular expression you would need to get it right.

If you really insist on having a nice short one and don't want to use a validation library, which, again, is a stupid idea and you should be using a library, then use the dumbest, most liberal expression you can. (Just don't forget the security issues.) Like this,

.+@.+

Seriously, if you add anything else you most almost surely make it incorrectly reject valid addresses. Note that e-mail addresses can contain spaces, and even more than one @! These are valid addresses,

"John Doe"@example.com
"@TheBeach"@example.com

I have not yet found a website that will accept either of these, even though both are completely valid addresses. Even MS Outlook, which I use at work (allowing me to verify this), will refuse to send e-mail to these addresses (Gmail accepts it just fine). Hmmm... maybe having an address like these is a good anti-spam measure!

So if your e-mail address is "John Doe"@example.com no one using Outlook can send you e-mail, which sounds like a feature to me, really.

So, everyone, please stop writing e-mail validation regular expressions. The work has been done, and you will only get it wrong, guaranteed.

This is a similar rant I came across while writing mine: Stop Doing Email Validation the Wrong Way.

A GNU Octave Feature

2008-08-29T00:00:00Z

At work they recently moved me to a new project. It is a Matlab-based data analysis thing. I haven't really touched Matlab in over a year (the last time I used Matlab at work), and, instead, use GNU Octave at home when the language is appropriate. I got so used to Octave that I found a pretty critical feature missing from Matlab's implementation: treat an expression as if it were of the type of its output.

Let's say we want to index into the result of a function. Take, for example, the magic square function, magic(). This spits out a magic square of the given size. In Octave we can generate a 4x4 magic square and chop out the middle 2x2 portion in one line.

octave> magic(4)(2:3,2:3)
ans =

   11   10
    7    6

Or more possibly clearly,

octave> [magic(4)](2:3,2:3)
ans =

   11   10
    7    6

Try this in Matlab and you will get a big, fat error. You have to assign the magic square to a temporary variable to do the same thing. I kept trying to do this sort of thing in Matlab and was thinking to myself, "I know I can do this somehow!". Nope, I was just used to having Octave.

Where this really shows is when you want to reshape a matrix into a nice, simple vector. If you have a matrix M and want to count the number of NaN's it has, you can't just apply the sum() function over isnan() because it only does sums of columns. You can get around this with a special index, (:).

So, to sum all elements in M directly,

octave> sum(M(:))

In Octave, to count NaN's with isnan(),

octave> sum(isnan(M)(:))

Again, Matlab won't let you index the result of isnan() directly. Stupid. I guess the Matlab way to do this is to apply sum() twice.

Every language I can think of handles this properly. C, C++, Perl, Ruby, etc. It is strange that Matlab itself doesn't have it. Score one more for Octave.

Proposal for a Free Musical

2007-09-19T00:00:00Z

An idea I had some time ago would be for a free (as in speech) musical (or play). It could be licensed under the GPL or something like it (the GNU Free Musical License, GFML, perhaps?). For example, think of the scripts and music scores as the “source code” for the musical, where these documents must be provided to ticket-holders upon request in digital form, such as on a CD, or printed. Just like free software, we are mostly concerned with preserving the freedom of the user/audience. These are free musicals.

Have you ever watched a great musical and your skin tingled at the orchestra’s crescendo during the hero’s solo? Or perhaps you choked up at a dramatic scene? These moments should not be locked up so that they cannot be shared. Free musicals would be another step towards a free culture where these scenes are not lost, where everyone is free to share his or her own culture. This freedom does not exist fully today. For instance, I can write a story about vampires but I can’t write one about Jedi.

Just as free software developers don’t have to starve, neither do free musical composers and writers. The word free refers to freedom, not price. A free musical author can distribute the musical to a producer for any price he or she wants.

You see, I was involved in my high school musicals growing up and I remember how they had to pay some steep royalties to put these shows on. One year, to help pay for the show we had collected spare change from students during lunch periods. Even for all this expense, we weren’t even permitted to make copies of the music and scripts as needed for use in the production (these cost extra). We were being dominated by the musical’s publisher.

This, of course, did not stop these extra copies from being made. It’s just another bad law which should have been and was ignored. Remember, the act of breaking laws itself is not wrong. You get to decide right and wrong for yourself. No one, especially a politician, can do this for you. I would wager that, in the US at least, just about everyone breaks some law at least once a month. Okay, back to free musicals.

If there were free musicals from which to choose, once a high school (or anyone) obtained a copy of a musical’s source code they would be free to put on a production without paying any special or additional per-seat or per-ticket royalties. They could make as many copies of scripts and scores as needed without having to break any laws. They could even send copies to other schools.

Let’s say a choir teacher (or whoever is directing things) goes out and sees a free musical somewhere. She enjoys the show so much she wants to have her students perform it as the next year’s production. As a ticket-holder, she requests and receives the source code for this show. That’s it! She can put on this show for the cost of a single ticket. In other cases, someone might be feeling generous and make the source code available to anyone at no cost to anyone who asks.

Since I have had the idea, I have dreamed of writing a free musical. Unfortunately, my writing skills are poor and my music skills are even worse. I have arranged music for a marching band in the past, but have done no serious composition. Maybe some day I will be good enough to write one. I would like to learn GNU LilyPond sometime and writing a free musical would be good practice.

Of course, an existing musical could always be liberated (expensively, without doubt) and turned into a free musical.

YouTube with Free Software

2007-09-05T00:00:00Z

Update 2009-6-30: Thanks to HTML 5 and the video tag I will be self-hosting videos from now on. This is information is only historical.

As I have stated previously, I love free software and I try to use free software exclusively whenever I can (it is very difficult to find employment in computer engineering where no proprietary software is used). This can pose a problem when I want to watch YouTube videos because I do not use the proprietary, non-free Flash player. The free Flash players currently either handle YouTube poorly or not at all. I also find Flash annoying enough that I am not interested in using these free Flash players anyway (fewer ads automatically!).

Like everyone else with an e-mail address, I get links to videos on YouTube from my friends. I also post videos there myself under the name “throwaway0” as it is convenient not only for me, but also for anyone who wants to watch the videos. Now, if the only way to watch these videos was with proprietary software, I would not encourage this. In fact, not too long ago this was true of most online video, which was limited to a “choose your poison” type situation between the proprietary, worthless Windows Media and QuickTime formats. No poison for me, thanks.

I have discovered several solutions to watching YouTube with free software. There are two steps involved: getting the video and playing the video. For the first you have youtube-dl ~~and then you have Firefox Fast Video Download~~.

youtube-dl is a Python script that you can easily install on your system. You just give it a YouTube URL and it does all the work. It feels a bit like wget. The Firefox add-on will add a nice little icon like this,

Clicking this icon will download the video. With either tool, you will have this .flv file somewhere that you want to watch.

To watch the video file, you can use mplayer or VLC. As far as I know, these videos are handled with free software only when using these players. They play fine (except without seeking) on my system. I use Debian GNU/Linux so I am pretty confident that my system is strictly free software.

Now you can watch YouTube without having to fall victim to proprietary software.