Articles tagged trick at null program

Protecting paths in macro expansions by extending UTF-8

2024-03-05T03:15:12Z

After a year I’ve finally came up with an elegant solution to a vexing u-config problem. The pkg-config format uses macros to generate build flags through recursive expansion. Some flags embed file system paths, but to the macro system it’s all strings. The output is also ultimately just one big string, which the receiving shell splits into fields. If a path contains spaces, or shell metacharacters, u-config must escape them so that shells treat them as part of a token. But how can u-config itself distinguish incidental spaces in paths from deliberate spaces between flags? What about other shell metacharacters in paths? My solution is to extend UTF-8 to encode metadata that survives macro expansion.

As usual, it helps to begin with a concrete example of the problem. The following is a conventional .pc file much like you’d find on your own system:

prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: Example
Version: 1.0
Description: An example .pc file
Cflags: -I${includedir}
Libs: -L${libdir} -lexample

It begins by defining the library’s installation prefix from which it derives additional paths, which are finally used in the package fields that generate build flags (Cflags, Libs). If I run u-config against this configuration:

$ pkg-config --cflags --libs example
-I/usr/include -L/usr/lib -lexample

Typically prefix is populated by the library’s build system, which knows where the library is to be installed. In some situations that’s not possible, and there is no opportunity to set prefix to a meaningful path. In that case, pkg-config can automatically override it (--define-prefix) with a path relative to the .pc file, making the installation relocatable. This works quite well on Windows, where it’s the default:

$ pkg-config --cflags --libs example
-IC:/Users/me/example/include -LC:/Users/me/example/lib -lexample

This just works… so long as the path does not contain spaces. If so, it risks splitting into separate fields. The .pc format supports quoting to control how such output is escaped. Regions between quotes are escaped in the output so that they retain their spaces when field split in the shell. If a .pc file author is careful, they’d write it with quotes:

Cflags: -I"${includedir}"
Libs: -L"${libdir}" -lexample

The paths are carefully placed within quoted regions so that they come out properly:

$ pkg-config --cflags example
-IC:/Program\ Files/example/include

Almost nobody writes their .pc files this way! The convention is not to quote. My original solution was to implicitly wrap prefix in quotes on assignment, which fixes the vast majority of .pc files. That effectively looks like this in the “virtual” .pc file:

prefix="C:/Program Files/example"
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

So the important region is quoted, its spaces preserved. However, the occasional library author actively supporting Windows inevitably runs into this problem, and their system’s pkg-config implementation does not quote prefix. They soon figure out explicit quoting and apply it, which then undermines u-config’s implicit quoting. The quotes essentially cancel out:

"$includedir" -> ""C:/Program Files/example"/include"

The quoted regions are inverted and nothing happens. Though this is a small minority, the libraries that do this and the ones you’re likely to use on Windows are correlated. I was stumped: How to support quoted and unquoted .pc files simultaneously?

Extending UTF-8

I recently had the thought: What if somehow u-config tracked which spans of string were paths. prefix is initially a path span, and then track it through macro-expansion and concatenation. Soon after that I realized it’s even simpler: Encode the spaces in a path as a value other than space, but also a value that cannot appear in the input. Recall that certain octets can never appear in UTF-8 text: the 8 values whose highest 5 bits are set. That would be the first octet of 5-octet, or longer, code point, but those are forbidden.

11111xxx

When paths enter the macro system, special characters are encoded as one of these 8 values. They’re converted back to their original ASCII values during output encoding, escaped. It doesn’t interact with the pkg-config quoting mechanism, so there’s no quote cancellation. Both quoting cases are supported equally.

For example, if space is mapped onto \xff (255), then:

in:  C:/Program Files/foo    -> C:/Program\xffFiles/foo
out: C:/Program\xffFiles/foo -> C:/Program\ Files/foo

Which prints the same regardless of ${includedir} or "${includedir}". Problem solved!

More metacharacters

That’s not the only complication. Outputs may deliberately include shell metacharacters, though typically these are Makefile fragments. For example, the default value of ${pc_top_builddir} is $(top_builddir), which make will later expand. While these characters are special to a shell, and certainly special to make, they must not be escaped.

What if a path contains these characters? The pkg-config quoting mechanism won’t help. It’s only concerned with spaces, and $(...) prints the same quoted nor not. As before, u-config must track provenance — whether or not such characters originated from a path.

If $PKG_CONFIG_TOP_BUILD_DIR is set, then pc_top_builddir is set to this environment variable, useful when the result isn’t processed by make. In this case it’s a path, and $(...) ought to be escaped. Even without $ it must be quoted, because the parentheses would still invoke a subshell. But who would put parenthesis in a path? Lo and behold!

C:/Program Files (x86)/example

Again, extending UTF-8 solves this as well: Encode $, (, and ) in paths using three of those forbidden octets, and escape them on the way out, allowing unencoded instances to go straight through.

in:  C:/Program\xffFiles\xff\xfdx86\xfe/example
out: C:/Program\ Files\ \(x86\)/example

This makes pc_top_builddir straightforward: default to a raw string, otherwise a path-encoded environment variable (note: s8 is a string type and upsert is a hash map):

    s8 top_builddir = s8("$(top_builddir)");
    if (envvar_set) {
        top_builddir = s8pathencode(envvar, perm);
    }
    *upsert(&global, s8("pc_top_builddir"), perm) = top_builddir;

For a particularly wild case, consider deliberately using a uname -m command substitution to construct a path, i.e. the path contains the target machine architecture (i686, x86_64, etc.):

Cflags: -I${prefix}/$(uname -m)/include

(Not that condone such nonsense. This is merely a reality of real world .pc files.) With prefix automatically set as above, this will print:

-IC:/Program\ Files\ \(x86\)/example/$(uname -m)/include

Path parentheses are escaped because they came from a path, but command substitution passes through because it came from the .pc source. Quite cool!

Test cross-architecture without leaving home

2021-08-21T23:59:33Z

I like to test my software across different environments, on strange platforms, and with alternative implementations. Each has its own quirks and oddities that can shake bugs out earlier. C is particularly good at this since it has such a wide selection of compilers and runs on everything. For instance I count at least 7 distinct C compilers in Debian alone. One advantage of writing portable software is access to a broader testing environment, and it’s one reason I prefer to target standards rather than specific platforms.

However, I’ve long struggled with architecture diversity. My work and testing has been almost entirely on x86, with ARM as a distant second (Raspberry Pi and friends). Big endian hosts are particularly rare. However, I recently learned a trick for quickly and conveniently accessing many different architectures without even leaving my laptop: QEMU User Emulation. Debian and its derivatives support this very well and require almost no setup or configuration.

Cross-compilation Example

While there are many options, my main cross-testing architecture has been PowerPC. It’s 32-bit big endian, while I’m generally working on 64-bit little endian, which is exactly the sort of mismatch I’m going for. I use a Debian-supplied cross-compiler and qemu-user tools. The binfmt support is especially slick, so that’s how I usually use it.

# apt install gcc-powerpc-linux-gnu qemu-user-binfmt

binfmt_misc is a kernel module that teaches Linux how to recognize arbitrary binary formats. For instance, there’s a Wine binfmt so that Linux programs can transparently exec(3) Windows .exe binaries. In the case of QEMU User Mode, binaries for foreign architectures are loaded into a QEMU virtual machine configured in user mode. In user mode there’s no guest operating system, and instead the virtual machine translates guest system calls to the host operating system.

The first package gives me powerpc-linux-gnu-gcc. The prefix is the architecture tuple describing the instruction set and system ABI. To try this out, I have a little test program that inspects its execution environment:

#include 

int main(void)
{
    char *w = "?";
    switch (sizeof(void *)) {
    case 1: w = "8";  break;
    case 2: w = "16"; break;
    case 4: w = "32"; break;
    case 8: w = "64"; break;
    }

    char *b = "?";
    switch (*(char *)(int []){1}) {
    case 0: b = "big";    break;
    case 1: b = "little"; break;
    }

    printf("%s-bit, %s endian\n", w, b);
}

When I run this natively on x86-64:

$ gcc test.c
$ ./a.out
64-bit, little endian

Running it on PowerPC via QEMU:

$ powerpc-linux-gnu-gcc -static test.c
$ ./a.out
32-bit, big endian

Thanks to binfmt, I could execute it as though the PowerPC binary were a native binary. With just a couple of environment variables in the right place, I could pretend I’m developing on PowerPC — aside from emulation performance penalties of course.

However, you might have noticed I pulled a sneaky on ya: -static. So far what I’ve shown only works with static binaries. There’s no dynamic loader available to run dynamically-linked binaries. Fortunately this is easy to fix in two steps. The first step is to install the dynamic linker for PowerPC:

# apt install libc6-powerpc-cross

The second is to tell QEMU where to find it since, unfortunately, it cannot currently do so on its own.

$ export QEMU_LD_PREFIX=/usr/powerpc-linux-gnu

Now I can leave out the -static:

$ powerpc-linux-gnu-gcc test.c
$ ./a.out
32-bit, big endian

A practical example: Remember binitools? I’m now ready to run its fuzz-generated test suite on this cross-testing platform.

$ git clone https://github.com/skeeto/binitools
$ cd binitools/
$ make check CC=powerpc-linux-gnu-gcc
...
PASS: 668/668

Or if I’m going to be running make often:

$ export CC=powerpc-linux-gnu-gcc
$ make -e check

Recall: make’s -e flag passes the environment through, so I don’t need to pass CC=... on the command line each time.

When setting up a test suite for your own programs, consider how difficult it would be to run the tests under customized circumstances like this. The easier it is to run your tests, the more they’re going to be run. I’ve run into many projects with such overly-complex test builds that even enabling sanitizers in the tests suite was a pain, let alone cross-architecture testing.

Dependencies? There might be a way to use Debian’s multiarch support to install these packages, but I haven’t been able to figure it out. You likely need to build dependencies yourself using the cross compiler.

Testing with Go

None of this is limited to C (or even C++). I’ve also successfully used this to test Go libraries and programs cross-architecture. This isn’t nearly as important since it’s harder to write unportable Go than C — e.g. dumb pointer tricks are literally labeled “unsafe”. However, Go (gc) trivializes cross-compilation and is statically compiled, so it’s incredibly simple. Once you’ve installed qemu-user-binfmt it’s entirely transparent:

$ GOARCH=mips64 go test

That’s all there is to cross-platform testing. If for some reason binfmt doesn’t work (WSL) or you don’t want to install it, there’s just one extra step (package named example):

$ GOARCH=mips64 go test -c
$ qemu-mips64-static example.test

The -c option builds a test binary but doesn’t run it, instead allowing you to choose where and how to run it.

It even works with cgo — if you’re willing to jump through the same hoops as with C of course:

package main

// #include 
// uint16_t v = 0x1234;
// char *hi = (char *)&v + 0;
// char *lo = (char *)&v + 1;
import "C"
import "fmt"

func main() {
	fmt.Printf("%02x %02x\n", *C.hi, *C.lo)
}

With go run on x86-64:

$ CGO_ENABLED=1 go run example.go
34 12

Via QEMU User Mode:

$ export CGO_ENABLED=1
$ export GOARCH=mips64
$ export CC=mips64-linux-gnuabi64-gcc
$ export QEMU_LD_PREFIX=/usr/mips64-linux-gnuabi64
$ go run example.go
12 34

I was pleasantly surprised how well this all works.

One dimension

Despite the variety, all these architectures are still “running” the same operating system, Linux, and so they only vary on one dimension. For most programs primarily targeting x86-64 Linux, PowerPC Linux is practically the same thing, while x86-64 OpenBSD is foreign territory despite sharing an architecture and ABI (System V). Testing across operating systems still requires spending the time to install, configure, and maintain these extra hosts. That’s an article for another time.

Well-behaved alias commands on Windows

2021-02-08T20:32:45Z

Since its inception I’ve faced a dilemma with w64devkit, my all-in-one Mingw-w64 toolchain and development environment distribution for Windows. A major goal of the project is no installation: unzip anywhere and it’s ready to go as-is. However, full functionality requires alias commands, particularly for BusyBox applets, and the usual solutions are neither available nor viable. It seemed that an installer was needed to assemble this last puzzle piece. This past weekend I finally discovered a tidy and complete solution that solves this problem for good.

That solution is a small C source file, alias.c. This article is about why it’s necessary and how it works.

Hard and symbolic links

Some alias commands are for convenience, such as a cc alias for gcc so that build systems need not assume any particular C compiler. Others are essential, such as an sh alias for “busybox sh” so that it’s available as a shell for make. These aliases are usually created with links, hard or symbolic. A GCC installation might include (roughly) a symbolic link created like so:

ln -s gcc cc

BusyBox looks at its argv[0] on startup, and if it names an applet (ls, sh, awk, etc.), it behaves like that applet. Typically BusyBox aliases are installed as hard links to the original binary, and there’s even a busybox --install to set these up. Both kinds of aliases are cheap and effective.

ln busybox sh
ln busybox ls
ln busybox awk

Unfortunately links are not supported by .zip files on Windows. They’d need to be created by a dedicated installer. As a result, I’ve strongly recommended that users run “busybox --install” at some point to establish the BusyBox alias commands. While w64devkit works without them, it works better with them. Still, that’s an installation step!

An alternative option is to simply include a full copy of the BusyBox binary for each applet — all 150 of them — simulating hard links. BusyBox is small, around 4kB per applet on average, but it’s not quite that small. Since the .zip format doesn’t use block compression — files are compressed individually — this duplication will appear in the .zip itself. My 573kB BusyBox build duplicated 150 times would double the distribution size and increase the installation footprint by 25%. It’s not worth the cost.

Since .zip is so limited, perhaps I should use a different distribution format that supports links. However, another w64devkit goal is making no assumptions about what other tools are installed. Windows natively supports .zip, even if that support isn’t so great (poor performance, low composability, missing features, etc.). With nothing more than the w64devkit .zip on a fresh, offline Windows installation, you can begin efficiently developing professional, native applications in under a minute.

Scripts as aliases

With links off the table, the next best option is a shell script. On unix-like systems shell scripts are an effective tool for creating complex alias commands. Unlike links, they can manipulate the argument list. For instance, w64devkit includes a c99 alias to invoke the C compiler configured to use the C99 standard. To do this with a shell script:

#!/bin/sh
exec cc -std=c99 "$@"

This prepends -std=c99 to the argument list and passes through the rest untouched via the Bourne shell’s special case "$@". Because I used exec, the shell process becomes the compiler in place. The shell doesn’t hang around in the background. It’s just gone. This really quite elegant and powerful.

The closest available on Windows is a .bat batch file. However, like some other parts of DOS and Windows, the Batch language was designed as though its designer once glimpsed at someone using a unix shell, perhaps looking over their shoulder, then copied some of the ideas without understanding them. As a result, it’s not nearly as useful or powerful. Here’s the Batch equivalent:

@cc -std=c99 %*

The @ is necessary because Batch prints its commands by default (Bourne shell’s -x option), and @ disables it. Windows lacks the concept of exec(3), so Batch file interpreter cmd.exe continues running alongside the compiler. A little wasteful but that hardly matters. What does matter though is that cmd.exe doesn’t behave itself! If you, say, Ctrl+C to cancel compilation, you will get the infamous “Terminate batch job (Y/N)?” prompt which interferes with other programs running in the same console. The so-called “batch” script isn’t a batch job at all: It’s interactive.

I tried to use Batch files for BusyBox applets, but this issue came up constantly and made this approach impractical. Nearly all BusyBox applets are non-interactive, and lots of things break when they aren’t. Worst of all, you can easily end up with layers of cmd.exe clobbering each other to ask if they should terminate. It was frustrating.

The prompt is hardcoded in cmd.exe and cannot be disabled. Since so much depends on cmd.exe remaining exactly the way it is, Microsoft will never alter this behavior either. After all, that’s why they made PowerShell a new, separate tool.

Speaking of PowerShell, could we use that instead? Unfortunately not:

It’s installed by default on Windows, but is not necessarily enabled. One of my own use cases for w64devkit involves systems where PowerShell is disabled by policy. A common policy is it can be used interactively but not run scripts (“Running scripts is disabled on this system”).
PowerShell is not a first class citizen on Windows, and will likely never be. Even under the friendliest policy it’s not normally possible to put a PowerShell script on the PATH and run it by name. (I’m sure there are ways to make this work via system-wide configuration, but that’s off the table.)
Everything in PowerShell is broken. For example, it does not support input redirection with files, and instead you must use the cat-like command, Get-Content, to pipe file contents. However, Get-Content translates its input and quietly damages your data. There is no way to disable this “feature” in the version of PowerShell that ships with Windows, meaning it cannot accomplish the simplest of tasks. This is just one of many ways that PowerShell is broken beyond usefulness.

Item (2) also affects w64devkit. It has a Bourne shell, but shell scripts are still not first class citizens since Windows doesn’t know what to do with them. Fixing would require system-wide configuration, antithetical to the philosophy of the project.

Solution: compiled shell “scripts”

My working solution is inspired by an insanely clever hack used by my favorite media player, mpv. The Windows build is strange at first glance, containing two binaries, mpv.exe (large) and mpv.com (tiny). Is that COM as in an old-school 16-bit DOS binary? No, that’s just a trick that works around a Windows limitation.

The Windows technology is broken up into subsystems. Console programs run in the Console subsystem. Graphical programs run in the Windows subsystem. The original WSL was a subsystem. Unfortunately this design means that a program must statically pick a subsystem, hardcoded into the binary image. The program cannot select a subsystem dynamically. For example, this is why Java installations have both java.exe and javaw.exe, and Emacs has emacs.exe and runemacs.exe. Different binaries for different subsystems.

On Linux, a program that wants to do graphics just talks to the Xorg server or Wayland compositor. It can dynamically choose to be a terminal application or a graphical application. Or even both at once. This is exactly the behavior of mpv, and it faces a dilemma on Windows: With subsystems, how can it be both?

The trick is based on the environment variable PATHEXT which tells Windows how to prioritize executables with the same base name but different file extensions. If I type mpv and it finds both mpv.exe and mpv.com, which binary will run? It will be the first listed in PATHEXT, and by default that starts with:

PATHEXT=.COM;.EXE;.BAT;...

So it will run mpv.com, which is actually a plain old PE+ .exe in disguise. The Windows subsystem mpv.exe gets the shortcut and file associations while Console subsystem mpv.com catches command line invocations and serves as console liaison as it invokes the real mpv.exe. Ingenious!

I realized I can pull a similar trick to create command aliases — not the .com trick, but the miniature flagger program. If only I could compile each of those Batch files to tiny, well-behaved .exe files so that it wouldn’t rely on the badly-behaved cmd.exe…

Tiny C programs

Years ago I wrote about tiny, freestanding Windows executables. That research paid off here since that’s exactly what I want. The alias command program need only manipulate its command line, invoke another program, then wait for it to finish. This doesn’t require the C library, just a handful of kernel32.dll calls. My alias command programs can be so small that would no longer matter that I have 150 of them, and I get complete control over their behavior.

To compile, I use -nostdlib and -ffreestanding to disable all system libraries, -lkernel32 to pull that one back in, -Os (optimize for size), and -s (strip) all to make the result as small as possible.

I don’t want to write a little program for each alias command. Instead I’ll use a couple of C defines, EXE and CMD, to inject the target command at compile time. So this Batch file:

@target arg1 arg2 %*

Is equivalent to this alias compilation:

gcc -DEXE="target.exe" -DCMD="target arg1 arg2" \
    -s -Os -nostdlib -ffreestanding -o alias.exe alias.c -lkernel32

The EXE string is the actual module name, so the .exe extension is required. The CMD string replaces the first complete token of the command line string (think argv[0]) and may contain arbitrary additional arguments (e.g. -std=c99). Both are handled as wide strings (L"...") since the alias program uses the wide Win32 API in order to be fully transparent. Though unfortunately at this time it makes no difference: All currently aliased programs use the “ANSI” API since the underlying C and C++ standard libraries only use the ANSI API. (As far as I know, nobody has ever written fully-functional C and C++ standard libraries for Windows, not even Microsoft.)

You might wonder why the heck I’m gluing strings together for the arguments. These will need to be parsed (word split, etc.) by someone else, so shouldn’t I construct an argv array instead? That’s not how it works on Windows: Programs receive a flat command string and are expected to parse it themselves following the format specification. When you write a C program, the C runtime does this for you to provide the usual argv array.

This is upside down. The caller creating the process already has arguments split into an argv array — or something like it — but Win32 requires the caller to encode the argv array as a string following a special format so that the recipient can immediately decode it. Why marshaling rather than pass structured data in the first place? Why does Win32 only supply a decoder (CommandLineToArgv) and not an encoder (e.g. the missing ArgvToCommandLine)? Hey, I don’t make the rules; I just have to live with them.

You can look at the original source for the details, but the summary is that I supply my own xstrlen(), xmemcpy(), and partial Win32 command line parser — just enough to identify the first token, even if that token is quoted. It glues the strings together, calls CreateProcessW, waits for it to exit (WaitForSingleObject), retrieves the exit code (GetExitCodeProcess), and exits with the same status. (The stuff that comes for free with exec(3).)

This all compiles to a 4kB executable, mostly padding, which is small enough for my purposes. These compress to an acceptable 1kB each in the .zip file. Smaller would be nicer, but this would require at minimum a custom linker script, and even smaller would require hand-crafted assembly.

This lingering issue solved, w64devkit now works better than ever. The alias.c source is included in the kit in case you need to make any of your own well-behaved alias commands.

Brute Force Incognito Browsing

2018-09-06T14:07:13Z

Both Firefox and Chrome have a feature for creating temporary private browsing sessions. Firefox calls it Private Browsing and Chrome calls it Incognito Mode. Both work essentially the same way. A temporary browsing session is started without carrying over most existing session state (cookies, etc.), and no state (cookies, browsing history, cached data, etc.) is preserved after ending the session. Depending on the configuration, some browser extensions will be enabled in the private session, and their own internal state may be preserved.

The most obvious use is for visiting websites that you don’t want listed in your browsing history. Another use for more savvy users is to visit websites with a fresh, empty cookie file. For example, some news websites use a cookie to track the number visits and require a subscription after a certain number of “free” articles. Manually deleting cookies is a pain (especially without a specialized extension), but opening the same article in a private session is two clicks away.

For web development there’s yet another use. A private session is a way to view your website from the perspective of a first-time visitor. You’ll be logged out and will have little or no existing state.

However, sometimes it just doesn’t go far enough. Some of those news websites have adapted, and in addition to counting the number of visits, they’ve figured out how to detect private sessions and block them. I haven’t looked into how they do this — maybe something to do with local storage, or detecting previously cached content. Sometimes I want a private session that’s truly fully isolated. The existing private session features just aren’t isolated enough or they behave differently, which is how they’re being detected.

Some time ago I put together a couple of scripts to brute force my own private sessions when I need them, generally for testing websites in a guaranteed fresh, fully-functioning instance. It also lets me run multiple such sessions in parallel. My scripts don’t rely on any private session feature of the browser, so the behavior is identical to a real browser, making it undetectable.

The downside is that, for better or worse, no browser extensions are carried over. In some ways this can be considered a feature, but a lot of the time I would like my ad-blocker to carry over. Your ad-blocker is probably the most important security software on your computer, so you should hesitate to give it up.

Another downside is that both Firefox and Chrome have some irritating first-time behaviors that can’t be disabled. The intent is to be newbie-friendly but it just gets in my way. For example, both bug me about logging into their browser platforms. Firefox starts with two tabs. Chrome creates a popup to ask me to configure a printer. Both start with a junk URL in the location bar so I can’t just middle-click paste (i.e. the X11 selection clipboard) into it. It’s definitely not designed for my use case.

Firefox

Here’s my brute force private session script for Firefox:

#!/bin/sh -e
DIR="${XDG_CACHE_HOME:-$HOME/.cache}"
mkdir -p -- "$DIR"
TEMP="$(mktemp -d -- "$DIR/firefox-XXXXXX")"
trap "rm -rf -- '$TEMP'" INT TERM EXIT
firefox -profile "$TEMP" -no-remote "$@"

It creates a temporary directory under $XDG_CACHE_HOME and tells Firefox to use the profile in that directory. No such profile exists, of course, so Firefox creates a fresh profile.

In theory I could just create a new profile alongside the default within my existing ~/.mozilla directory. However, I’ve never liked Firefox’s profile feature, especially with the intentionally unpredictable way it stores the profile itself: behind random path. I also don’t trust it to be fully isolated and to fully clean up when I’m done.

Before starting Firefox, I register a trap with the shell to clean up the profile directory regardless of what happens. It doesn’t matter if Firefox exits cleanly, if it crashes, or if I CTRL-C it to death.

The -no-remote option prevents the new Firefox instance from joining onto an existing Firefox instance, which it really prefers to do even though it’s technically supposed to be a different profile.

Note the "$@", which passes arguments through to Firefox — most often the URL of the site I want to test.

Chromium

I don’t actually use Chrome but rather the open source version, Chromium. I think this script will also work with Chrome.

#!/bin/sh -e
DIR="${XDG_CACHE_HOME:-$HOME/.cache}"
mkdir -p -- "$DIR"
TEMP="$(mktemp -d -- "$DIR/chromium-XXXXXX")"
trap "rm -rf -- '$TEMP'" INT TERM EXIT
chromium --user-data-dir="$TEMP" \
         --no-default-browser-check \
         --no-first-run \
         "$@" >/dev/null 2>&1

It’s exactly the same as the Firefox script and only the browser arguments have changed. I tell it not to ask about being the default browser, and --no-first-run disables some of the irritating first-time behaviors.

Chromium is very noisy on the command line, so I also redirect all output to /dev/null.

If you’re on Debian like me, its version of Chromium comes with a --temp-profile option that handles the throwaway profile automatically. So the script can be simplified:

#!/bin/sh -e
chromium --temp-profile \
         --no-default-browser-check \
         --no-first-run \
         "$@" >/dev/null 2>&1

In my own use case, these scripts have fully replaced the built-in private session features. In fact, since Chromium is not my primary browser, my brute force private session script is how I usually launch it. I only run it to test things, and I always want to test using a fresh profile.

Render Multimedia in Pure C

2017-11-03T22:31:15Z

Update 2020: I’ve produced many more examples over the years (even more).

In a previous article I demonstrated video filtering with C and a unix pipeline. Thanks to the ubiquitous support for the ridiculously simple Netpbm formats — specifically the “Portable PixMap” (.ppm, P6) binary format — it’s trivial to parse and produce image data in any language without image libraries. Video decoders and encoders at the ends of the pipeline do the heavy lifting of processing the complicated video formats actually used to store and transmit video.

Naturally this same technique can be used to produce new video in a simple program. All that’s needed are a few functions to render artifacts — lines, shapes, etc. — to an RGB buffer. With a bit of basic sound synthesis, the same concept can be applied to create audio in a separate audio stream — in this case using the simple (but not as simple as Netpbm) WAV format. Put them together and a small, standalone program can create multimedia.

Here’s the demonstration video I’ll be going through in this article. It animates and visualizes various in-place sorting algorithms (see also). The elements are rendered as colored dots, ordered by hue, with red at 12 o’clock. A dot’s distance from the center is proportional to its corresponding element’s distance from its correct position. Each dot emits a sinusoidal tone with a unique frequency when it swaps places in a particular frame.

Original credit for this visualization concept goes to w0rthy.

All of the source code (less than 600 lines of C), ready to run, can be found here:

https://github.com/skeeto/sort-circle

On any modern computer, rendering is real-time, even at 60 FPS, so you may be able to pipe the program’s output directly into your media player of choice. (If not, consider getting a better media player!)

$ ./sort | mpv --no-correct-pts --fps=60 -

VLC requires some help from ppmtoy4m:

$ ./sort | ppmtoy4m -F60:1 | vlc -

Or you can just encode it to another format. Recent versions of libavformat can input PPM images directly, which means x264 can read the program’s output directly:

$ ./sort | x264 --fps 60 -o video.mp4 /dev/stdin

By default there is no audio output. I wish there was a nice way to embed audio with the video stream, but this requires a container and that would destroy all the simplicity of this project. So instead, the -a option captures the audio in a separate file. Use ffmpeg to combine the audio and video into a single media file:

$ ./sort -a audio.wav | x264 --fps 60 -o video.mp4 /dev/stdin
$ ffmpeg -i video.mp4 -i audio.wav -vcodec copy -acodec mp3 \
         combined.mp4

You might think you’ll be clever by using mkfifo (i.e. a named pipe) to pipe both audio and video into ffmpeg at the same time. This will only result in a deadlock since neither program is prepared for this. One will be blocked writing one stream while the other is blocked reading on the other stream.

Several years ago my intern and I used the exact same pure C rendering technique to produce these raytracer videos:

I also used this technique to illustrate gap buffers.

Pixel format and rendering

This program really only has one purpose: rendering a sorting video with a fixed, square resolution. So rather than write generic image rendering functions, some assumptions will be hard coded. For example, the video size will just be hard coded and assumed square, making it simpler and faster. I chose 800x800 as the default:

#define S     800

Rather than define some sort of color struct with red, green, and blue fields, color will be represented by a 24-bit integer (long). I arbitrarily chose red to be the most significant 8 bits. This has nothing to do with the order of the individual channels in Netpbm since these integers are never dumped out. (This would have stupid byte-order issues anyway.) “Color literals” are particularly convenient and familiar in this format. For example, the constant for pink: 0xff7f7fUL.

In practice the color channels will be operated upon separately, so here are a couple of helper functions to convert the channels between this format and normalized floats (0.0–1.0).

static void
rgb_split(unsigned long c, float *r, float *g, float *b)
{
    *r = ((c >> 16) / 255.0f);
    *g = (((c >> 8) & 0xff) / 255.0f);
    *b = ((c & 0xff) / 255.0f);
}

static unsigned long
rgb_join(float r, float g, float b)
{
    unsigned long ir = roundf(r * 255.0f);
    unsigned long ig = roundf(g * 255.0f);
    unsigned long ib = roundf(b * 255.0f);
    return (ir << 16) | (ig << 8) | ib;
}

Originally I decided the integer form would be sRGB, and these functions handled the conversion to and from sRGB. Since it had no noticeable effect on the output video, I discarded it. In more sophisticated rendering you may want to take this into account.

The RGB buffer where images are rendered is just a plain old byte buffer with the same pixel format as PPM. The ppm_set() function writes a color to a particular pixel in the buffer, assumed to be S by S pixels. The complement to this function is ppm_get(), which will be needed for blending.

static void
ppm_set(unsigned char *buf, int x, int y, unsigned long color)
{
    buf[y * S * 3 + x * 3 + 0] = color >> 16;
    buf[y * S * 3 + x * 3 + 1] = color >>  8;
    buf[y * S * 3 + x * 3 + 2] = color >>  0;
}

static unsigned long
ppm_get(unsigned char *buf, int x, int y)
{
    unsigned long r = buf[y * S * 3 + x * 3 + 0];
    unsigned long g = buf[y * S * 3 + x * 3 + 1];
    unsigned long b = buf[y * S * 3 + x * 3 + 2];
    return (r << 16) | (g << 8) | b;
}

Since the buffer is already in the right format, writing an image is dead simple. I like to flush after each frame so that observers generally see clean, complete frames. It helps in debugging.

static void
ppm_write(const unsigned char *buf, FILE *f)
{
    fprintf(f, "P6\n%d %d\n255\n", S, S);
    fwrite(buf, S * 3, S, f);
    fflush(f);
}

Dot rendering

If you zoom into one of those dots, you may notice it has a nice smooth edge. Here’s one rendered at 30x the normal resolution. I did not render, then scale this image in another piece of software. This is straight out of the C program.

In an early version of this program I used a dumb dot rendering routine. It took a color and a hard, integer pixel coordinate. All the pixels within a certain distance of this coordinate were set to the color, everything else was left alone. This had two bad effects:

Dots jittered as they moved around since their positions were rounded to the nearest pixel for rendering. A dot would be centered on one pixel, then suddenly centered on another pixel. This looked bad even when those pixels were adjacent.
There’s no blending between dots when they overlap, making the lack of anti-aliasing even more pronounced.

Instead the dot’s position is computed in floating point and is actually rendered as if it were between pixels. This is done with a shader-like routine that uses smoothstep — just as found in shader languages — to give the dot a smooth edge. That edge is blended into the image, whether that’s the background or a previously-rendered dot. The input to the smoothstep is the distance from the floating point coordinate to the center (or corner?) of the pixel being rendered, maintaining that between-pixel smoothness.

Rather than dump the whole function here, let’s look at it piece by piece. I have two new constants to define the inner dot radius and the outer dot radius. It’s smooth between these radii.

#define R0    (S / 400.0f)  // dot inner radius
#define R1    (S / 200.0f)  // dot outer radius

The dot-drawing function takes the image buffer, the dot’s coordinates, and its foreground color.

static void
ppm_dot(unsigned char *buf, float x, float y, unsigned long fgc);

The first thing to do is extract the color components.

    float fr, fg, fb;
    rgb_split(fgc, &fr, &fg, &fb);

Next determine the range of pixels over which the dot will be draw. These are based on the two radii and will be used for looping.

    int miny = floorf(y - R1 - 1);
    int maxy = ceilf(y + R1 + 1);
    int minx = floorf(x - R1 - 1);
    int maxx = ceilf(x + R1 + 1);

Here’s the loop structure. Everything else will be inside the innermost loop. The dx and dy are the floating point distances from the center of the dot.

    for (int py = miny; py <= maxy; py++) {
        float dy = py - y;
        for (int px = minx; px <= maxx; px++) {
            float dx = px - x;
            /* ... */
        }
    }

Use the x and y distances to compute the distance and smoothstep value, which will be the alpha. Within the inner radius the color is on 100%. Outside the outer radius it’s 0%. Elsewhere it’s something in between.

            float d = sqrtf(dy * dy + dx * dx);
            float a = smoothstep(R1, R0, d);

Get the background color, extract its components, and blend the foreground and background according to the computed alpha value. Finally write the pixel back into the buffer.

            unsigned long bgc = ppm_get(buf, px, py);
            float br, bg, bb;
            rgb_split(bgc, &br, &bg, &bb);

            float r = a * fr + (1 - a) * br;
            float g = a * fg + (1 - a) * bg;
            float b = a * fb + (1 - a) * bb;
            ppm_set(buf, px, py, rgb_join(r, g, b));

That’s all it takes to render a smooth dot anywhere in the image.

Rendering the array

The array being sorted is just a global variable. This simplifies some of the sorting functions since a few are implemented recursively. They can call for a frame to be rendered without needing to pass the full array. With the dot-drawing routine done, rendering a frame is easy:

#define N     360           // number of dots

static int array[N];

static void
frame(void)
{
    static unsigned char buf[S * S * 3];
    memset(buf, 0, sizeof(buf));
    for (int i = 0; i < N; i++) {
        float delta = abs(i - array[i]) / (N / 2.0f);
        float x = -sinf(i * 2.0f * PI / N);
        float y = -cosf(i * 2.0f * PI / N);
        float r = S * 15.0f / 32.0f * (1.0f - delta);
        float px = r * x + S / 2.0f;
        float py = r * y + S / 2.0f;
        ppm_dot(buf, px, py, hue(array[i]));
    }
    ppm_write(buf, stdout);
}

The buffer is static since it will be rather large, especially if S is cranked up. Otherwise it’s likely to overflow the stack. The memset() fills it with black. If you wanted a different background color, here’s where you change it.

For each element, compute its delta from the proper array position, which becomes its distance from the center of the image. The angle is based on its actual position. The hue() function (not shown in this article) returns the color for the given element.

With the frame() function complete, all I need is a sorting function that calls frame() at appropriate times. Here are a couple of examples:

static void
shuffle(int array[N], uint64_t *rng)
{
    for (int i = N - 1; i > 0; i--) {
        uint32_t r = pcg32(rng) % (i + 1);
        swap(array, i, r);
        frame();
    }
}

static void
sort_bubble(int array[N])
{
    int c;
    do {
        c = 0;
        for (int i = 1; i < N; i++) {
            if (array[i - 1] > array[i]) {
                swap(array, i - 1, i);
                c = 1;
            }
        }
        frame();
    } while (c);
}

Synthesizing audio

To add audio I need to keep track of which elements were swapped in this frame. When producing a frame I need to generate and mix tones for each element that was swapped.

Notice the swap() function above? That’s not just for convenience. That’s also how things are tracked for the audio.

static int swaps[N];

static void
swap(int a[N], int i, int j)
{
    int tmp = a[i];
    a[i] = a[j];
    a[j] = tmp;
    swaps[(a - array) + i]++;
    swaps[(a - array) + j]++;
}

Before we get ahead of ourselves I need to write a WAV header. Without getting into the purpose of each field, just note that the header has 13 fields, followed immediately by 16-bit little endian PCM samples. There will be only one channel (monotone).

#define HZ    44100         // audio sample rate

static void
wav_init(FILE *f)
{
    emit_u32be(0x52494646UL, f); // "RIFF"
    emit_u32le(0xffffffffUL, f); // file length
    emit_u32be(0x57415645UL, f); // "WAVE"
    emit_u32be(0x666d7420UL, f); // "fmt "
    emit_u32le(16,           f); // struct size
    emit_u16le(1,            f); // PCM
    emit_u16le(1,            f); // mono
    emit_u32le(HZ,           f); // sample rate (i.e. 44.1 kHz)
    emit_u32le(HZ * 2,       f); // byte rate
    emit_u16le(2,            f); // block size
    emit_u16le(16,           f); // bits per sample
    emit_u32be(0x64617461UL, f); // "data"
    emit_u32le(0xffffffffUL, f); // byte length
}

Rather than tackle the annoying problem of figuring out the total length of the audio ahead of time, I just wave my hands and write the maximum possible number of bytes (0xffffffff). Most software that can read WAV files will understand this to mean the entire rest of the file contains samples.

With the header out of the way all I have to do is write 1/60th of a second worth of samples to this file each time a frame is produced. That’s 735 samples (1,470 bytes) at 44.1kHz.

The simplest place to do audio synthesis is in frame() right after rendering the image.

#define FPS   60            // output framerate
#define MINHZ 20            // lowest tone
#define MAXHZ 1000          // highest tone

static void
frame(void)
{
    /* ... rendering ... */

    /* ... synthesis ... */
}

With the largest tone frequency at 1kHz, Nyquist says we only need to sample at 2kHz. 8kHz is a very common sample rate and gives some overhead space, making it a good choice. However, I found that audio encoding software was a lot happier to accept the standard CD sample rate of 44.1kHz, so I stuck with that.

The first thing to do is to allocate and zero a buffer for this frame’s samples.

    int nsamples = HZ / FPS;
    static float samples[HZ / FPS];
    memset(samples, 0, sizeof(samples));

Next determine how many “voices” there are in this frame. This is used to mix the samples by averaging them. If an element was swapped more than once this frame, it’s a little louder than the others — i.e. it’s played twice at the same time, in phase.

    int voices = 0;
    for (int i = 0; i < N; i++)
        voices += swaps[i];

Here’s the most complicated part. I use sinf() to produce the sinusoidal wave based on the element’s frequency. I also use a parabola as an envelope to shape the beginning and ending of this tone so that it fades in and fades out. Otherwise you get the nasty, high-frequency “pop” sound as the wave is given a hard cut off.

    for (int i = 0; i < N; i++) {
        if (swaps[i]) {
            float hz = i * (MAXHZ - MINHZ) / (float)N + MINHZ;
            for (int j = 0; j < nsamples; j++) {
                float u = 1.0f - j / (float)(nsamples - 1);
                float parabola = 1.0f - (u * 2 - 1) * (u * 2 - 1);
                float envelope = parabola * parabola * parabola;
                float v = sinf(j * 2.0f * PI / HZ * hz) * envelope;
                samples[j] += swaps[i] * v / voices;
            }
        }
    }

Finally I write out each sample as a signed 16-bit value. I flush the frame audio just like I flushed the frame image, keeping them somewhat in sync from an outsider’s perspective.

    for (int i = 0; i < nsamples; i++) {
        int s = samples[i] * 0x7fff;
        emit_u16le(s, wav);
    }
    fflush(wav);

Before returning, reset the swap counter for the next frame.

    memset(swaps, 0, sizeof(swaps));

Font rendering

You may have noticed there was text rendered in the corner of the video announcing the sort function. There’s font bitmap data in font.h which gets sampled to render that text. It’s not terribly complicated, but you’ll have to study the code on your own to see how that works.

Learning more

This simple video rendering technique has served me well for some years now. All it takes is a bit of knowledge about rendering. I learned quite a bit just from watching Handmade Hero, where Casey writes a software renderer from scratch, then implements a nearly identical renderer with OpenGL. The more I learn about rendering, the better this technique works.

Before writing this post I spent some time experimenting with using a media player as a interface to a game. For example, rather than render the game using OpenGL or similar, render it as PPM frames and send it to the media player to be displayed, just as game consoles drive television sets. Unfortunately the latency is horrible — multiple seconds — so that idea just doesn’t work. So while this technique is fast enough for real time rendering, it’s no good for interaction.

Rolling Shutter Simulation in C

2017-07-02T18:35:16Z

The most recent Smarter Every Day (#172) explains a phenomenon that results from rolling shutter. You’ve likely seen this effect in some of your own digital photographs. When a CMOS digital camera captures a picture, it reads one row of the sensor at a time. If the subject of the picture is a fast-moving object (relative to the camera), then the subject will change significantly while the image is being captured, giving strange, unreal results:

In the Smarter Every Day video, Destin illustrates the effect by simulating rolling shutter using a short video clip. In each frame of the video, a few additional rows are locked in place, showing the effect in slow motion, making it easier to understand.

At the end of the video he thanks a friend for figuring out how to get After Effects to simulate rolling shutter. After thinking about this for a moment, I figured I could easily accomplish this myself with just a bit of C, without any libraries. The video above this paragraph is the result.

I previously described a technique to edit and manipulate video without any formal video editing tools. A unix pipeline is sufficient for doing minor video editing, especially without sound. The program at the front of the pipe decodes the video into a raw, uncompressed format, such as YUV4MPEG or PPM. The tools in the middle losslessly manipulate this data to achieve the desired effect (watermark, scaling, etc.). Finally, the tool at the end encodes the video into a standard format.

$ decode video.mp4 | xform-a | xform-b | encode out.mp4

For the “decode” program I’ll be using ffmpeg now that it’s back in the Debian repositories. You can throw a video in virtually any format at it and it will write PPM frames to standard output. For the encoder I’ll be using the x264 command line program, though ffmpeg could handle this part as well. Without any filters in the middle, this example will just re-encode a video:

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    x264 -o output.mp4 /dev/stdin

The filter tools in the middle only need to read and write in the raw image format. They’re a little bit like shaders, and they’re easy to write. In this case, I’ll write C program that simulates rolling shutter. The filter could be written in any language that can read and write binary data from standard input to standard output.

Update: It appears that input PPM streams are a rather recent feature of libavformat (a.k.a lavf, used by x264). Support for PPM input first appeared in libavformat 3.1 (released June 26th, 2016). If you’re using an older version of libavformat, you’ll need to stick ppmtoy4m in front of x264 in the processing pipeline.

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    ppmtoy4m | \
    x264 -o output.mp4 /dev/stdin

Video filtering in C

In the past, my go to for raw video data has been loose PPM frames and YUV4MPEG streams (via ppmtoy4m). Fortunately, over the years a lot of tools have gained the ability to manipulate streams of PPM images, which is a much more convenient format. Despite being raw video data, YUV4MPEG is still a fairly complex format with lots of options and annoying colorspace concerns. PPM is simple RGB without complications. The header is just text:

P6

The maximum depth is virtually always 255. A smaller value reduces the image’s dynamic range without reducing the size. A larger value involves byte-order issues (endian). For video frame data, the file will typically look like:

P6
1920 1080
255

Unfortunately the format is actually a little more flexible than this. Except for the new line (LF, 0x0A) after the maximum depth, the whitespace is arbitrary and comments starting with # are permitted. Since the tools I’m using won’t produce comments, I’m going to ignore that detail. I’ll also assume the maximum depth is always 255.

Here’s the structure I used to represent a PPM image, just one frame of video. I’m using a flexible array member to pack the data at the end of the structure.

struct frame {
    size_t width;
    size_t height;
    unsigned char data[];
};

Next a function to allocate a frame:

static struct frame *
frame_create(size_t width, size_t height)
{
    struct frame *f = malloc(sizeof(*f) + width * height * 3);
    f->width = width;
    f->height = height;
    return f;
}

We’ll need a way to write the frames we’ve created.

static void
frame_write(struct frame *f)
{
    printf("P6\n%zu %zu\n255\n", f->width, f->height);
    fwrite(f->data, f->width * f->height, 3, stdout);
}

Finally, a function to read a frame, reusing an existing buffer if possible. The most complex part of the whole program is just parsing the PPM header. The %*c in the scanf() specifically consumes the line feed immediately following the maximum depth.

static struct frame *
frame_read(struct frame *f)
{
    size_t width, height;
    if (scanf("P6 %zu%zu%*d%*c", &width, &height) < 2) {
        free(f);
        return 0;
    }
    if (!f || f->width != width || f->height != height) {
        free(f);
        f = frame_create(width, height);
    }
    fread(f->data, width * height, 3, stdin);
    return f;
}

Since this program will only be part of a pipeline, I’m not worried about checking the results of fwrite() and fread(). The process will be killed by the shell if something goes wrong with the pipes. However, if we’re out of video data and get an EOF, scanf() will fail, indicating the EOF, which is normal and can be handled cleanly.

An identity filter

That’s all the infrastructure we need to built an identity filter that passes frames through unchanged:

int main(void)
{
    struct frame *frame = 0;
    while ((frame = frame_read(frame)))
        frame_write(frame);
}

Processing a frame is just matter of adding some stuff to the body of the while loop.

A rolling shutter filter

For the rolling shutter filter, in addition to the input frame we need an image to hold the result of the rolling shutter. Each input frame will be copied into the rolling shutter frame, but a little less will be copied from each frame, locking a little bit more of the image in place.

int
main(void)
{
    int shutter_step = 3;
    size_t shutter = 0;
    struct frame *f = frame_read(0);
    struct frame *out = frame_create(f->width, f->height);
    while (shutter < f->height && (f = frame_read(f))) {
        size_t offset = shutter * f->width * 3;
        size_t length = f->height * f->width * 3 - offset;
        memcpy(out->data + offset, f->data + offset, length);
        frame_write(out);
        shutter += shutter_step;
    }
    free(out);
    free(f);
}

The shutter_step controls how many rows are capture per frame of video. Generally capturing one row per frame is too slow for the simulation. For a 1080p video, that’s 1,080 frames for the entire simulation: 18 seconds at 60 FPS or 36 seconds at 30 FPS. If this program were to accept command line arguments, controlling the shutter rate would be one of the options.

Putting it all together:

$ ffmpeg -i input.mp4 -f image2pipe -vcodec ppm pipe:1 | \
    ./rolling-shutter | \
    x264 -o output.mp4 /dev/stdin

Here are some of the results for different shutter rates: 1, 3, 5, 8, 10, and 15 rows per frame. Feel free to right-click and “View Video” to see the full resolution video.

Source and original input

This post contains the full source in parts, but here it is all together:

rshutter.c

Here’s the original video, filmed by my wife using her Nikon D5500, in case you want to try it for yourself:

It took much longer to figure out the string-pulling contraption to slowly spin the fan at a constant rate than it took to write the C filter program.

Followup Links

On Hacker News, morecoffee shared a video of the second order effect (direct link), where the rolling shutter speed changes over time.

A deeper analysis of rolling shutter: Playing detective with rolling shutter photos.

Render the Mandelbrot Set with jq

2016-09-15T02:39:13Z

One of my favorite data processing tools is jq, a command line JSON processor. It’s essentially awk for JSON. You supply a small script composed of transformations and filters, and jq evaluates the filters on each input JSON object, producing zero or more outputs per input. My most common use case is converting JSON data into CSV with jq’s @csv filter, which is then fed into SQLite (another favorite) for analysis.

On a recent pass over the manual, the while and until filters caught my attention, lighting up my Turing-completeness senses. These filters allow jq to compute an arbitrary recurrence, such as the Mandelbrot set.

Setting that aside for a moment, I said before that an input could produce zero or more outputs. The zero is when it gets filtered out, and one output is the obvious case. Some filters produce multiple outputs from a single input. There are a number of situations when this happens, but the important one is the range filter. For example,

$ echo 6 | jq 'range(1; .)'
1
2
3
4
5

The . is the input object, and range is producing one output for every number between 1 and . (exclusive). If an expression has multiple filters producing multiple outputs, under some circumstances jq will produce a Cartesian product: every combination is generated.

$ echo 4 | jq -c '{x: range(1; .), y: range(1; .)}'
{"x":1,"y":1}
{"x":1,"y":2}
{"x":1,"y":3}
{"x":2,"y":1}
{"x":2,"y":2}
{"x":2,"y":3}
{"x":3,"y":1}
{"x":3,"y":2}
{"x":3,"y":3}

So if my goal is the Mandelbrot set, I can use this to generate the complex plane, over which I will run the recurrence. For input, I’ll use a single object with the keys x, dx, y, and dy, defining the domain and range of the image. A reasonable input might be:

{"x": [-2.5, 1.5], "dx": 0.05, "y": [-1.5, 1.5], "dy": 0.1}

The “body” of the until will be the Mandelbrot set recurrence.

z(n+1) = z(n)^2 + c

As you might expect, jq doesn’t have support for complex numbers, so the components will be computed explicitly. I’ve worked it out before, so borrowing that I finally had my script:

#!/bin/sh
echo '{"x": [-2.5, 1.5], "dx": 0.05, "y": [-1.5, 1.5], "dy": 0.1}' | \
  jq -jr "{ \
     ci: range(.y[0]; .y[1] + .dy; .dy), \
     cr: range(.x[0]; .x[1]; .dx), \
     k: 0, \
     r: 0, \
     i: 0, \
   } | until(.r * .r + .i * .i > 4 or .k == 94; { \
         cr,
         ci,
         k: (.k + 1),
         r: (.r * .r - .i * .i + .cr),
         i: (.r * .i * 2 + .ci) \
       }) \
   | [.k + 32] | implode"

It iterates to a maximum depth of 94: the number of printable ASCII characters, except space. The final two filters convert the output ASCII characters, and the -j and -r options tell jq to produce joined, raw output. So, if you have jq installed and an exactly 80-character wide terminal, go ahead and run that script. You should see something like this:

!!!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!"""""""""""""""###########"""""""""""""""""""""""""""""
!!!!!!!!!!!!!!"""""""""#########################""""""""""""""""""""""
!!!!!!!!!!!!"""""""################$$$$$%3(%%$$$####""""""""""""""""""
!!!!!!!!!!!"""""################$$$$$$%%&'+)+J%$$$$####"""""""""""""""
!!!!!!!!!!"""################$$$$$$$%%%&()D8+(&%%$$$$#####""""""""""""
!!!!!!!!!""################$$$$$$$%%&&'(.~~~~2(&%%%%$$######""""""""""
!!!!!!!!""##############$$$$$$%%&'(((()*.~~~~-*)(&&&2%$$#####"""""""""
!!!!!!!""#############$$$$%%%%&&',J~0:~~~~~~~~~~4,./0/%$######""""""""
!!!!!!!"###########$$%%%%%%%&&&(.,^~~~~~~~~~~~~~~~~~4'&%$######"""""""
!!!!!!"#######$$$%%','''''''''(+4~~~~~~~~~~~~~~~~~~~1)3%$$######""""""
!!!!!!###$$$$$$%%%&'*04,-C-+))+8~~~~~~~~~~~~~~~~~~~~~/(&$$#######"""""
!!!!!!#$$$$$$%%%%&'(+2~~~~~~~/0~~~~~~~~~~~~~~~~~~~~~~?'%$$$######"""""
!!!!!!$$$$$&&&&'(,-.6~~~~~~~~~A~~~~~~~~~~~~~~~~~~~~~~(&%$$$######"""""
!!!!!!`ce~~ku{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~,('&%$$$#######""""
!!!!!!$$$$$&&&&'(,-.6~~~~~~~~~A~~~~~~~~~~~~~~~~~~~~~~(&%$$$######"""""
!!!!!!#$$$$$$%%%%&'(+2~~~~~~~/0~~~~~~~~~~~~~~~~~~~~~~?'%$$$######"""""
!!!!!!###$$$$$$%%%&'*04,-C-+))+8~~~~~~~~~~~~~~~~~~~~~/(&$$#######"""""
!!!!!!"#######$$$%%','''''''''(+4~~~~~~~~~~~~~~~~~~~1)3%$$######""""""
!!!!!!!"###########$$%%%%%%%&&&(.,^~~~~~~~~~~~~~~~~~4'&%$######"""""""
!!!!!!!""#############$$$$%%%%&&',J~0:~~~~~~~~~~4,./0/%$######""""""""
!!!!!!!!""##############$$$$$$%%&'(((()*.~~~~-*)(&&&2%$$#####"""""""""
!!!!!!!!!""################$$$$$$$%%&&'(.~~~~2(&%%%%$$######""""""""""
!!!!!!!!!!"""################$$$$$$$%%%&()D8+(&%%$$$$#####""""""""""""
!!!!!!!!!!!"""""################$$$$$$%%&'+)+L%$$$$####"""""""""""""""
!!!!!!!!!!!!"""""""################$$$$$%3(%%$$$####""""""""""""""""""
!!!!!!!!!!!!!!"""""""""#########################""""""""""""""""""""""
!!!!!!!!!!!!!!!"""""""""""""""###########"""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""

Tweaking the input parameters, it scales up nicely:

As demonstrated by the GIF, it’s very slow compared to more reasonable implementations, but I wouldn’t expect otherwise. It could be turned into a zoom animation just by feeding it more input objects with different parameters. It will produce one full “image” per input. Capturing an animation is left as an exercise for the reader.

Lisp Let in GNU Octave

2012-02-08T00:00:00Z

In BrianScheme, the standard Lisp binding form let isn’t a special form. That is, it’s not a hard-coded language feature, or special form. It’s built on top of lambda. In any lexically-scoped Lisp, the expression,

(let ((x 10)
      (y 20))
  (* 10 20))

Can also be written as,

((lambda (x y)
   (* x y))
 10 20)

BrianScheme’s let is just a macro that transforms into a lambda expression. This is also what made it so important to implement lambda lifting, to optimize these otherwise-expensive forms.

It’s possible to achieve a similar effect in GNU Octave (but not Matlab, due to its flawed parser design). The language permits simple lambda expressions, much like Python.

> f = @(x) x + 10;
> f(4)
ans = 14

It can be used to create a scope in a language that’s mostly devoid of scope. For example, I can avoid assigning a value to a temporary variable just because I need to use it in two places. This one-liner generates a random 3D unit vector.

(@(v) v / norm(v))(randn(1, 3))

The anonymous function is called inside the same expression where it’s created. In practice, doing this is stupid. It’s confusing and there’s really nothing to gain by being clever, doing it in one line instead of two. Most importantly, there’s no macro system that can turn this into a new language feature. However, I enjoyed using this technique to create a one-liner that generates n random unit vectors.

n = 1000;
p = (@(v) v ./ repmat(sqrt(sum(abs(v) .^ 2, 2)), 1, 3))(randn(n, 3));

Why was I doing this? I was using the Monte Carlo method to double-check my solution to this math problem:

What is the average straight line distance between two points on a sphere of radius 1?

I was also demonstrating to Gavin that simply choosing two angles is insufficient, because the points the angles select are not evenly distributed over the surface of the sphere. I generated this video, where the poles are clearly visible due to the uneven selection by two angles.

This took hours to render with gnuplot! Here are stylized versions: Dark and Light.

Poor Man's Video Editing

2011-11-28T00:00:00Z

I’ve done all my video editing in a very old-school, unix-style way. I actually have no experience with real video editing software, which may explain why I tolerate the manual process. Instead, I use several open source tools, none of which are designed specifically for video editing.

MPlayer
ImageMagick (or any batch image editing tool)
ppmtoy4m
The WebM encoder (or your preferred encoder)

The first three are usually available from your Linux distribution repositories, making them trivial to obtain. The last one is easy to obtain and compile.

~~If you’re using a modern browser, you should have noticed my portrait on the left-hand side changed recently~~ (update: it’s been removed). That’s an HTML5 WebM video — currently with Ogg Theora fallback due to a GitHub issue. To cut the video down to that portrait size, I used the above four tools on the original video.

WebM seems to be becoming the standard HTML5 video format. Google is pushing it and it’s supported by all the major browsers, except Safari. So, unless something big happens, I plan on going with WebM for web video in the future.

To begin, as I’ve done before, split the video into its individual frames,

mplayer -vo jpeg -ao dummy -benchmark video_file

The -benchmark option hints for mplayer to go as fast as possible, rather than normal playback speed.

Next look through the output frames and delete any unwanted frames to keep, such as the first and last few seconds of video. With the desired frames remaining, use ImageMagick, or any batch image editing software, to crop out the relevant section of the images. This can be done in parallel with xargs’ -P option — to take advantage of multiple cores if disk I/O isn’t being the bottleneck.

ls *.jpg | xargs -I{} -P5 convert {} 312x459+177+22 {}.ppm

That crops out a 312 by 459 section of the image, with the top-left corner at (177, 22). Any other convert filters can be stuck in there too. Notice the output format is the portable pixmap (ppm), which is significant because it won’t introduce any additional loss and, most importantly, it is required by the next tool.

If I’m happy with the result, I use ppmtoy4m to pipe the new frames to the encoder,

cat *.ppm | ppmtoy4m | vpxenc --best -o output.webm -

As the name implies, ppmtoy4m converts a series of portable pixmap files into a YUV4MPEG2 (y4m) video stream. YUV4MPEG2 is the bitmap of the video world: gigantic, lossless, uncompressed video. It’s exactly the kind of thing you want to hand to a video encoder. If you need to specify any video-specific parameters, ppmtoy4m is the tool that needs to know it. For example, to set the framerate to 10 FPS,

... | ppmtoy4m -F 10:1 | ...

ppmtoy4m is a classically-trained unix tool: stdin to stdout. No need to dump that raw video to disk, just pipe it right into the WebM encoder. If you choose a different encoder, it might not support reading from stdin, especially if you do multiple passes. A possible workaround would be a named pipe,

mkfifo video.y4m
cat *.ppm | ppmtoy4m > video.y4m &
otherencoder video.4pm

For WebM encoding, I like to use the --best option, telling the encoder to take its time to do a good job. To do two passes and get even more quality per byte (--passes=2) a pipe cannot be used and you’ll need to write the entire raw video onto the disk. If you try to pipe it anyway, vpxenc will simply crash rather than give an error message (as of this writing). This had me confused for awhile.

To produce Ogg Theora instead of WebM, ffmpeg2theora is a great tool. It’s well-behaved on the command line and can be dropped in place of vpxenc.

To do audio, encode your audio stream with your favorite audio encoder (Vorbis, Lame, etc.) then merge them together into your preferred container. For example, to add audio to a WebM video (i.e. Matroska), use mkvmerge from MKVToolNix,

mkvmerge --webm -o combined.webm video.webm audio.ogg

Extra notes update: There’s a bug in imlib2 where it can’t read PPM files that have no initial comment, so some tools, including GIMP and QIV, can’t read PPM files produced by ImageMagick. Fortunately ppmtoy4m is unaffected. However, there is a bug in ppmtoy4m where it can’t read PPM files with a depth other than 8 bits. Fix this by giving the option -depth 8 to ImageMagick’s convert.

Some Cool Shell Aliases

2011-11-03T00:00:00Z

Over the last couple of years I’ve worked out some cool shell tricks, which I use as aliases. Like any good software developer, if I notice a pattern I take steps to generalize and reduce it. For shells, it might be as simple as replacing a regularly typed long command with a short alias, but the coolest tricks are the ones that reduce an entire habit.

The first one is the singleton pattern. Say you have a terminal program that should only have a single process instance but should only start on demand. Some programs may enforce that rule, if it makes sense to, but some do not.

In my case, that program was rtorrent. I only want a single instance of this program running at a time, but I also don’t want to have to think about whether or not I’ve started it already. I always run it in screen so that I can detach it and let it run in the background. My shell habits looked like this.

# Assume it's there already
$ screen -r rtorrent
# If not, fire it up
$ screen -S rtorrent rtorrent

If I needed to start rtorrent for the first time I was often typing in that first command just to see it fail. Fortunately, it really does fail: the exit code is non-zero. This allows me to make this cool alias,

alias rt='screen -r rtorrent || screen -S rtorrent rtorrent'

Either it attaches to the existing instance or fires a new one up for me and attaches me to that one. Now, there is a race condition here. That “or” operator isn’t atomic, so something else might spawn an rtorrent instance in between check and creation. Since I’m only ever running this by hand, and there is only one of me, that’s not a problem.

The next trick has to do with my habit of throwing up a temporary web server when I need to share files. I noticed that I would launch it, run it for a minute, kill it, run one or two commands, and launch it again. For example, if I’m working on a program and I want to share the build with someone else. I might drop out of it just to do something with git and rebuild. Once again, my alias fix involves screen,

alias httpd='screen -S http python -m SimpleHTTPServer 8080'

Rather than kill the server only to restart it again, I always run it in screen. So instead I detach, but I don’t even need to bother reattaching.

This next one is my Emacs alias. Emacs has the really, really cool ability to become a daemon. You can launch a daemon instance, then connect to it as needed with clients to do your editing (emacsclient --create-frame or just -c). This allows your Emacs session to live for a long time, preserving all your buffers. Long-living sessions are an old Lisp tradition. Also, being a daemon eliminates any lengthy startup penalty, because it only happens once after reboot.

$ emacs --daemon
$ emacsclient -c
# Close it and sometime later start another client
$ emacsclient -c

This is another case of the single-instance problem. However, Emacs is really smart about managing this by itself. It has an argument, --alternate-editor (-a), which allows you to specify another editor to use in case the daemon isn’t started.

emacsclient -ca nano

The most important part of this option is its hidden feature. When the argument is empty it defaults to launching a daemon. No need to launch it manually, it’s just one command.

alias e='emacsclient -cna ""'

Naturally, Emacs gets to be in one of the coveted, single-letter slots. I also set one up for terminal mode Emacs (-t instead of -c),

alias et='emacsclient -ta ""'

And just to teach the editor heathens a lesson or two, this command has a second alias,

alias vi='emacsclient -ta ""'

The final trick is one I just figured out this week, and it involves passphrase agents. Just in case you are not familiar, both ssh and gpg have daemons which will securely store your passphrases for you.

Update June 2012: I have a better solution for this problem.

OpenSSH is loaded with extremely useful functionality. One of them is key authentication. Rather than use a password to log into a system, you can prove your identity cryptographically — you solve a math problem that only you have the information to solve. This is invaluable to Git, because it allows for passwordless access to remote repositories. You can host a repository for a bunch of users without the awkward password step (“Pst … your password is passwordABC. Change it after you first log in.”). Instead, they all send you their public keys.

To use this feature, you first you generate a public/private keypair for yourself, which gets stored in ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub.

$ ssh-keygen

At one point in this process you will be asked for a passphrase, which is used to encrypt your private key. At first you might wonder why you bother with the key at all if you’re going to encrypt it. Rather than enter a password to log into a system, you have to enter a passphrase, which is even worse because it’s longer. So you don’t bother with a passphrase. This is dangerous because it’s practically the same as storing your login password in a file! If someone got a hold of your private key file, they have full access to your systems.

This is where ssh-agent comes in. It runs as a service in the background. You register your private key with it with ssh-add and it queries for your passphrase, storing it in memory. It’s very careful about this. It’s stored on a memory page that has been registered with the operating system such that it’s never written to permanent storage (swap). When the process dies, or zeros out that memory, the passphrase is completely lost.

GnuPG’s daemon, gpg-agent, works very similarly. It holds onto your PGP passphrase so you can perform a number of actions with it without needing to enter it a bunch of times.

gpg and ssh know how communicate with their agents through information stored in environmental variables. However, this creates a problem when launching the agents. They can’t change the environment of their parent process, your shell. The easiest way to do it is to reverse the relationship, with the agent becoming the parent of your shell.

$ exec ssh-agent bash

This tells ssh-agent to launch a new shell with the proper environment. In case you’re not familiar, the exec causes the new shell replace the current one. It’s the same exec() as the POSIX function. You could leave it off, but you’ll be left with nested shells, which can be very confusing.

GnuPG’s agent is launched like this,

$ exec gpg-agent --daemon bash

My problem was that I often want both of these at the same time. I could run them back to back, but making them into a single, alias-able command is tricky. This naive expression will not work,

$ exec ssh-agent bash && exec gpg-agent bash

The first exec causes the current shell to end, so the second part is never evaluated. I can’t simply remove the execs and live with nested shells because the next agent isn’t launched until the first agent’s shell dies. The very cool solution is to chain them together!

alias agent='exec ssh-agent gpg-agent --daemon bash'

The current shell turns into the ssh-agent, which spawns gpg-agent with the proper environment for ssh, which forwards its environment along to spawn a new shell with the proper environment for both.

If I ever need another agent I just add it to the chain. That command should probably be at the end of my .bashrc or something, but it’s just an alias for now. Sometimes I log into X first, sometimes ssh first, so I’m not sure what the correct place would be.

Throw Up a Quick HTTP Server

2010-09-21T00:00:00Z

Ever since I learned this neat trick from Luke a few months ago I've been using it almost weekly. If you have Python installed then you have a miniature web server at your fingertips.

python -m SimpleHTTPServer

Bam! If I need to transfer large files directly to someone over the Internet, I'll throw up one of these and give them the address. Each of my home computers has an 8xxx port forwarded to it in my router's NAT configuration, so they're all ready to do this any time I need it.

When I needed this before, I would use my own Emacs web server, but the Python solution above is even more convenient most of the time. Plus it does directory listings, which I never bothered to add to my web server.

The reason I bring it up right now is because it finally saved me a lot of time at work. I was performing a half dozen Ubuntu installs for a system we're building, but none of these computers have network access, except to each other. I was using the Ubuntu DVD, which includes a larger software selection than the CD.

Well, it seems no one ever actually uses the DVD like this, because it doesn't work right now with the current DVD. You see, apt doesn't treat the DVD as just any repository. It has its own special cache for them, keeping track of what packages are on what CDs and DVDs. Well, the problem is that the CD is named differently than the DVD, and apt (and apt-cdrom) keeps looking for the CD when it should be looking for the DVD, foolishly ignoring any advice I give in the configuration. Python SimpleHTTPServer to the rescue!

I mounted the DVD and ran the HTTP server at the DVD's root. Then I added my localhost server to the sources.list. Worked like a charm! apt had no idea it was pulling packages off the DVD.

Middleman Parallelization

2010-09-08T00:00:00Z

I recently discovered a very clever tool called Middleman. It's a quick way to set up and manage multiple-process workload queue. The process output and display is done inside of a screen session, so if it's going to take awhile you can just detach and check on it again later. In the past I used make's -j option to do this, but that's always a pain to set up.

It is composed of three programs: mdm.screen, mdm-run, and mdm-sync. The first is the top level supervisor that you use to launch the enhanced shell script. The second prefixes every command to be run in parallel. The third is prefixes the final command that depends on all of the individual processes.

The linked Middleman page has a good example, but I'll share my own anyway. I used it over the weekend to download a long series of videos with youtube-dl. Because the transfer rate for a single video is throttled I wanted to grab several at a time, but I also didn't want to grab them all at the same time. Here's the dumb version of the script, download.sh, that does them all at once.

#!/bin.sh
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8 &

With Middleman all I had to do was this,

#!/bin/sh
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8

Then just launch the script with mdm.screen. It defaults to 6 processes at a time, but you can adjust it to whatever you want with the -n switch. I used 4.

$ mdm.screen -n 4 ./download.sh

There is a screen window that lists the process queue and highlights the currently active jobs. I could switch between screen windows to see the output from individual processes and see how they were doing.

From the perspective of the shell script, the first four commands finish instantly but fifth command will block. As soon as Middleman sees one of the first four processes complete the fifth one will begin work, returning control to the shell script, and the sixth command will block, since the queue is full again.

I'm sure I'll be using this more in the future, especially for tasks like batch audio and video encoding. I bet this could be useful on a cluster.

Identifying Files

2010-05-20T00:00:00Z

At work I currently spend about a third of my time doing data reduction, and it's become one of my favorite tasks. (I've done it on my own too). Data come in from various organizations and sponsors in all sorts of strange formats. We have a bunch of fancy analysis tools to work on the data, but they aren't any good if they can't read the format. So I'm tasked with writing tools to convert incoming data into a more useful format.

If the source file is a text-based file it's usually just a matter of writing a parser — possibly including a grammar — after carefully studying the textual structure. Binary files are trickier. Fortunately, there are a few tools that come in handy for identifying the format of a strange binary file.

The first is the standard utility found on any unix-like system: file . I have no idea if it has an official website because it's a term that's impossible to search for. It tries to identify a file based on the magic numbers and other tests, none based on the actual file name. I've never been to lucky to have file recognize a strange format at work. But silence speaks volumes: it means the data are not packed into something common, like a simple zip archive.

Next, I take a look at the file with ent, a pseudo-random number sequence test program. This will reveal how compressed (or even encrypted) data are. If ent says the data are very dense, say 7 bits per byte or more, the format is employing a good compression algorithm. The next step would be tackling that so I can start over on the uncompressed contents. If it's something like 4 bits per byte there's no compression. If it's in between then it might be employing a weak, custom compression algorithm. I've always seen the latter two.

Next I dive in with a hex editor. I use a combination of Emacs' hexl-mode and the standard BSD tool hexdump (for something more static). One of the first things I like to identify is byte order, and in a hex dump it's often obvious.

In general, better designed formats use big endian, also known as network order. That's the standard ordering used in communication, regardless of the native byte ordering of the network clients. The amateur, home-brew formats are generally less thoughtful and dump out whatever the native format is, usually little endian because that's what x86 is. Worse, they'll also generate data on architectures that are big endian, so you can get it both ways without any warning. In that case your conversion tool has to be sensitive to byte order and find some way to identify which ordering a file is using. A time-stamp field is very useful here, because a 64-bit time-stamp read with the wrong byte order will give a very unreasonable date.

For example, here's something I see often.

eb 03 00 00 35 00 00 00 66 1e 00 00

That's most likely 3 4-byte values, in little endian byte order. The zeros make the integers stand out.

eb 03 00 00 35 00 00 00 66 1e 00 00

We can tell it's little endian because the non-zero digits are on the left. This information will be useful in identifying more bytes in the file.

Next I'd look for headers, common strings of bytes, so that I can identify larger structures in the data. I've never had to reverse engineer a format ... yet. I'm not sure if I could. Once I got this far I've always been able to research the format further and find either source code or documentation, revealing everything to me.

If the file contains strings I'll dump them out with strings. I haven't found this too useful at work, but it's been useful at home.

And there's something still useful beyond these. Something I made myself at home for a completely different purpose, but I've exploited its side effects: my PNG Archiver. The original purpose of the tool is to store a file in an image, as images are easier to share with others. The side effect is that by viewing the image I get to see the structure of the file. For example, here's my laptop's /bin/ls, very roughly labeled.

It's easy to spot the different segments of the ELF format. Higher entropy sections are more brightly colored. Strings, being composed of ASCII-like text, have their MSB's unset, which is why they're darker. Any non-compressed format will have an interesting profile like this. Here's a Word doc, an infamously horrible format,

And here's some Emacs bytecode. You can tell the code vectors apart from the constants section below it.

If you find yourself having to inspect strange files, keep these tools around to make the job easier.

Magick Thumbnails

2009-12-21T00:00:00Z

For a long time I couldn't figure out how to make decent thumbnails with ImageMagick. Specifically, I wanted to create uniform sized thumbnails from arbitrary images. Over the weekend I came across the ImageMagick Examples page, which shows exactly how to do this. Here's the command for a 150x150 thumbnail,

convert orig.jpg -thumbnail 150x150^ -gravity center \
        -extent 150x150 thumb.jpg

It cuts out the largest square possible from the center of the image and resizes that to 150x150. This capability has actually only been available for 2 years now! It wasn't there last time I needed it.

I can think of one way to improve it: instead of selecting the center, it selects the area with the highest information density. This could be measured by edge detection, corner detection, or some other statistical method. It would be selected by changing the gravity option to, say, "entropy".

I'm listing this here mostly for my own future reference. :-)

GNU Screen

2009-03-05T00:00:00Z

Another useful program I use every week is GNU Screen. It provides virtual terminals at a single terminal. It's a bit like a window manager for a text terminal. If you are a command line junkie (and if you are at all serious about computing, you should be), this is an essential piece of software.

The main reason I use screen is for its persistence. If I am running a long-running job on a remote machine (i.e. over ssh), like a large apt-get upgrade, I'll put it in a screen session. This way I can log out and, later, log in from anywhere and check on it. I have even used it to persist nethack sessions, though this isn't really necessary.

The only annoying part is that all of it's mappings are underneath C-a (ctrl+a), which is a very common Emacs/bash command, which I use a lot. To get the effect of C-a inside screen, you have to do it twice in a row because screen captures the first one.

If you don't already use it, try it out sometime.

Readline Wrap

2009-02-20T00:00:00Z

I came across a very interesting tool the other day called rlwrap. It wraps the readline library over just about any interactive text input program. The readline library provides basic editing and history. It's handy for those programs that don't provide their own line editing facilities.

It tries to be as transparent as possible, detecting yes/no prompts and passwords, so it should still be reasonable under those conditions.

If you can't think of anything to try it with, try it with cat. Instant line editor!

rlwrap cat > some-file.txt

Or with Festival.

rlwrap festival --tts

It will also turn incorrectly compiled shells on your system into something usable. On my system (Debian GNU/Linux), csh isn't usable without rlwrap.

rlwrap csh

Movie DNA

2007-12-11T00:00:00Z

Update: A follow-up of this post has a script that can do the montage part the job much faster than ImageMagick.

Brendan Dawes has this interesting idea he calls Cinema Redux. A entire film is distilled down to a single image. You take one frame from each second of the movie, shrink that frame down to 8x6 pixels, then line them up in a montage with 60 frames per row. Each row then represents one minute of film. There are 8 examples on his website.

I was interested in trying this for myself, but I couldn’t find any of his code, which he had written in Java, to do it myself. Then it hit me: I really don’t need to write anything to do this! Here is how you can make your own using only two tools: mplayer and ImageMagick.

Originally I thought that I may need to write a small Perl script to glue these two things together, but found, after digging though man pages, that this was completely unnecessary. There are two steps involved and each tool does one step: grab all of the frames, and second, make a montage out of those frames. Grabbing the frames is one call to mplayer,

mplayer -vo jpeg:outdir=frames -ao dummy -vf framestep=30,scale=8:6 \

What we are doing here is dumping every 30th frame (assuming 30 frames-per-second) into a directory named frames. These images will be named by consecutive 8-digit numbers. These frames are also resized down to 8x6 pixels. If you are converting a video with a different aspect ratio, such as a wide-screen movie without letter-boxing, you will need to adjust this. A wide-screen film would be 16x9.

Next, we glue these frames together with ImageMagick,

montage -geometry +0+0 -background black -tile 60x "frames/*jpg" \
        montage.jpg

This will create the montage in the file montage.jpg . There is something important to note here. See how the file glob is quoted so that the shell will not expand it? Thats because listing 7000 frames pushes the limits of the system in passing command line arguments. ImageMagick knows about file globs and will do this internally.

/download/cinrdx.sh

And that’s it! I put these together into that handy shell script that will also remove the frames after the montage has been successfully created. The process takes between 6 and 12 hours, depending on the length of the movie. It takes the movie running time to produce all the frames file, then it spends the rest of the time creating the montage, which is disappointingly slow. (Maybe I could write a Perl script that does it faster?) The script will create a montage out of just about any video you throw at it, thanks to mplayer. Example usage for DVDs,

$ ./cinrdx.sh dvd://

I did it on three movies so far: Gladiator, Tron, and The Matrix. I did it to Tron and The Matrix because I wanted to see if these movies have a dominant color scheme.

To inspect the coloring of these films, I took a hue histogram. Tron is very obvious: lots of blues and cyans dominate,

I was expecting to see a lot of green show up in The Matrix, but was a little bit disappointed,

To get these histograms, I loaded the images into GNU Octave, converted it to HSV so that the red channel is really the hue channel. Then I had the GIMP make the histograms by providing the histogram of the “red” (read hue) channel. I dropped an HSV color bar below with some image editing.

octave> m = imread("movie.jpg");
octave> [x map] = rgb2ind(m);
octave> map = rgb2hsv(map);
octave> imwrite("movie-hsv.jpg", x, map);

See if you can find some really interesting things to do with this.

PNG Archiver - Share Files Within Images

2007-09-08T00:00:00Z

This is one of my projects.

PNG Archiver

The original idea for this project came from Sean Howard’s Gameplay Mechanic’s #012. The basic idea here is that image files are the second easiest type of data to share on the Internet (the first being text). Sharing anything other than images may be difficult, so why not store files within an image as the image data? This is not steganography as the data is not being hidden. In fact, the data is quite obvious because we are trying to make the data as compact as possible in the image.

My “PNG Archiver” is usable but should still be considered alpha quality software. I am adding support for different types of PNGs (currently it does 8-bit RGB only), but I have found that using the libpng library gives me headaches. The archiver can actually only store a single file (just as gzip doesn’t know what a file is). This is because I do not want to duplicate the work of real file archivers like tar. To store multiple files, make a “png-tarball”.

The PNG Archiver stores a checksum in the image that allows it to verify that the data was received correctly. This also allows it to automatically scan the image for data. When it reads in a piece that fulfills the checksum it assumes that it found the data you are looking for. You can decorate the image with text or a border and the archiver should still find the data as long as you didn’t disturb it. (examples of this on the project page)