Automatic Deletion of Incomplete Output Files

August 07, 2016

nullprogram.com/blog/2016/08/07/

Conventionally, a program that creates an output file will delete its incomplete output should an error occur while writing the file. It’s risky to leave behind a file that the user may rightfully confuse for a valid file. They might not have noticed the error.

For example, compression programs such as gzip, bzip2, and xz when given a compressed file as an argument will create a new file with the compression extension removed. They write to this file as the compressed input is being processed. If the compressed stream contains an error in the middle, the partially-completed output is removed.

There are exceptions of course, such as programs that download files over a network. The partial result has value, especially if the transfer can be continued from where it left off. The convention is to append another extension, such as “.part”, to indicate a partial output.

The straightforward solution is to always delete the file as part of error handling. A non-interactive program would report the error on standard error, delete the file, and exit with an error code. However, there are at least two situations where error handling would be unable to operate: unhandled signals (usually including a segmentation fault) and power failures. A partial or corrupted output file will be left behind, possibly looking like a valid file.

A common, more complex approach is to name the file differently from its final name while being written. If written successfully, the completed file is renamed into place. This is already required for durable replacement, so it’s basically free for many applications. In the worst case, where the program is unable to clean up, the obviously incomplete file is left behind only wasting space.

Looking to be more robust, I had the following misguided idea: Rely completely on the operating system to perform cleanup in the case of a failure. Initially the file would be configured to be automatically deleted when the final handle is closed. This takes care of all abnormal exits, and possibly even power failures. The program can just exit on error without deleting the file. Once written successfully, the automatic-delete indicator is cleared so that the file survives.

The target application for this technique supports both Linux and Windows, so I would need to figure it out for both systems. On Windows, there’s the flag FILE_FLAG_DELETE_ON_CLOSE. I’d just need to find a way to clear it. On POSIX, file would be unlinked while being written, and linked into the filesystem on success. The latter turns out to be a lot harder than I expected.

Solution for Windows

I’ll start with Windows since the technique actually works fairly well here — ignoring the usual, dumb Win32 filesystem caveats. This is a little surprising, since it’s usually Win32 that makes these things far more difficult than they should be.

The primary Win32 function for opening and creating files is CreateFile. There are many options, but the key is FILE_FLAG_DELETE_ON_CLOSE. Here’s how an application might typically open a file for output.

DWORD access = GENERIC_WRITE;
DWORD create = CREATE_ALWAYS;
DWORD flags = FILE_FLAG_DELETE_ON_CLOSE;
HANDLE f = CreateFile("out.tmp", access, 0, 0, create, flags, 0);

This special flag asks Windows to delete the file as soon as the last handle to to file object is closed. Notice I said file object, not file, since these are different things. The catch: This flag is a property of the file object, not the file, and cannot be removed.

However, the solution is simple. Create a new link to the file so that it survives deletion. This even works for files residing on a network shares.

CreateHardLink("out", "out.tmp", 0);
CloseHandle(f);  // deletes out.tmp file

The gotcha is that the underlying filesystem must be NTFS. FAT32 doesn’t support hard links. Unfortunately, since FAT32 remains the least common denominator and is still widely used for removable media, depending on the application, your users may expect support for saving files to FAT32. A workaround is probably required.

Solution for Linux

This is where things really fall apart. It’s just barely possible on Linux, it’s messy, and it’s not portable anywhere else. There’s no way to do this for POSIX in general.

My initial thought was to create a file then unlink it. Unlike the situation on Windows, files can be unlinked while they’re currently open by a process. These files are finally deleted when the last file descriptor (the last reference) is closed. Unfortunately, using unlink(2) to remove the last link to a file prevents that file from being linked again.

Instead, the solution is to use the relatively new (since Linux 3.11), Linux-specific O_TMPFILE flag when creating the file. Instead of a filename, this variation of open(2) takes a directory and creates an unnamed, temporary file in it. These files are special in that they’re permitted to be given a name in the filesystem at some future point.

For this example, I’ll assume the output is relative to the current working directory. If it’s not, you’ll need to open an additional file descriptor for the parent directory, and also use openat(2) to avoid possible race conditions (since paths can change from under you). The number of ways this can fail is already rapidly multiplying.

int fd = open(".", O_TMPFILE|O_WRONLY, 0600);

The catch is that only a handful of filesystems support O_TMPFILE. It’s like the FAT32 problem above, but worse. You could easily end up in a situation where it’s not supported, and will almost certainly require a workaround.

Linking a file from a file descriptor is where things get messier. The file descriptor must be linked with linkat(2) from its name on the /proc virtual filesystem, constructed as a string. The following snippet comes straight from the Linux open(2) manpage.

char buf[64];
sprintf(buf, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, buf, AT_FDCWD, "out", AT_SYMLINK_FOLLOW);

Even on Linux, /proc isn’t always available, such as within a chroot or a container, so this part can fail as well. In theory there’s a way to do this with the Linux-specific AT_EMPTY_PATH and avoid /proc, but I couldn’t get it to work.

// Note: this doesn't actually work for me.
linkat(fd, "", AT_FDCWD, "out", AT_EMPTY_PATH);

Given the poor portability (even within Linux), the number of ways this can go wrong, and that a workaround is definitely needed anyway, I’d say this technique is worthless. I’m going to stick with the tried-and-true approach for this one.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.