null program

Memory Leaks with XMLHttpRequest Objects

I'm writing this post because I am not aware of any other article that gets it right. All my searches result in misleading and factually incorrect information. If you know of an article that does get it right, please share it.

I really love jQuery. It is by far my favorite XML library -- the only one that's enjoyable to use, really. It cleans up a lot of the Document Object Model's ugliness, along with a few other important browser APIs. One such API is the misnamed HTTP request object, XMLHttpRequest (XHR).

For those who are unfamiliar, this object is used to make HTTP requests after the page has loaded, with direct access to the response data. Here's the simplest use case for asynchronously fetching JSON-encoded data from the server. As written, it will only work in modern browsers. For backwards compatibility, feature sniffing would be required and the onreadystatechange event would be used instead.

var xhr = new XMLHttpRequest();
xhr.addEventListener('load', function() {
    var data = JSON.parse(xhr.responseText);
    // ...
});
xhr.open('GET', '/widgets/id/443424805.json', true);
xhr.send();

Here's what the jQuery version looks like. Notice how it's a lot more functional, passing the server response data as an argument rather than gluing it to a mutable object. Of course, underneath it's just using an XHR object, performing lots of feature sniffing to normalize its behavior across different browsers.

$.getJSON('/widgets/id/443424805.json', function(data) {
    // ...
});

// Or using the generic jQuery AJAX API:

$.ajax({
    dataType: 'json',
    url: '/widgets/id/443424805.json',
    success: function(data) {
        // ...
    }
});

This is what the core AJAX API should have looked like. The XMLHttpRequest API has a critical flaw: it's at odds with garbage collection. It's a strange, special object that gets to survive after all JavaScript references to it are lost. Normal JavaScript objects don't behave like this.

Also strange is that, while many people have observed XHR memory leaks, very few people understand what's going on! Try doing some searches for XHR memory leaks. You'll see answers talking about closures and reference cycles, but they have nothing to do with XHR-related memory leaks. It's the blind leading the blind.

Closures

Let's quickly review what is not the problem. A closure is a function that retains its lexical environment, closing over its non-local variables.

function makeCounter() {
    var x = 0;
    return function() {
        return ++x;
    };
}

When makeCounter() is called, a binding named x is established, initially bound to the value 0 -- an assignment. Then a closure is created by the function expression, capturing this binding, and the closure is returned. This would normally be the end of life for the newly established binding, open for garbage collection, but it was captured by the closure. This entire process happens on each invocation of makeCounter().

var counterA = makeCounter();
var counterB = makeCounter();
counterA();  // => 1
counterA();  // => 2
counterA();  // => 3
counterB();  // => 1

When the returned closure, here assigned to counterA and another to counterB, is invoked, it reassigns x to a new value, then returns that value. x has become a truly private variable for each closure, completely inaccessible except through this single call.

Closures can capture more values than the programmer intended, which will cause the captures values to live longer than expected -- a leak. Fortunately, this is unusual. Consider this function.

function makeGreeter(name) {
    var greeting = "Hello, " + name;
    return function() {
        return greeting;
    };
}

The body of makeGreeter() has two bindings, name and greeting. Theoretically, the closure will capture name as well as greeting because they're both part of its lexical environment. The value assigned to name could live longer than intended. In practice, this is not the case. Compilers are smart enough to see that the closure makes no reference to name -- so long as eval isn't present.

Circular References

With closures in mind, consider the typical use case for an XHR.

function getText(url, callback) {
    var xhr = new XMLHttpRequest();
    xhr.onload = function() {
        callback(xhr.responseText);
    }
    xhr.open('get', url, true);
    xhr.send();
}

A binding named xhr is established and a closure is created which references xhr as a free variable, so it gets captured. This closure is assigned to a property on the XHR. This is a circular reference. The XHR references the closure through onload and the closure references the XHR through the closed-over variable xhr.

Under some forms of memory management, such as reference counting, this could be an issue. Fortunately, JavaScript implementations can handle this situation just fine (well, except before IE8). Garbage collectors operate on reachability. Cycles don't matter, the collector only cares if any part of the cycle is reachable by a root, a hard reference from where the collector begins its search.

Browser JavaScript can't afford to get hung up on cycles. The DOM is loaded with circularity; parent nodes reference their children and child nodes reference their parents.

What's Really Going On

Take another close look at getText(). Two objects are created, an XHR and a closure, they're not assigned to anywhere outside of the function, and nothing is returned. Under normal circumstances, this means these two objects are free to be garbage collected. A compiler could determine this through escape analysis and perform extra optimizations, such as stack allocation. However, XHR instances are special objects so these are not normal circumstances.

This is an asynchronous request and JavaScript is single-threaded. When getText() is invoked, no HTTP request is actually made until sometime after getText() exits. What would happen if the XHR and closure were garbage collected before the server responded? Since the callback doesn't exist any more, at best the response would be lost. If this was the case, users would need to be careful to maintain a reference to the XHR, lest they risk losing data. This is not how it works.

Instead, the browser keeps an inaccessible, internal reference to the XHR (i.e. a garbage collection root). This keeps not only the XHR alive for the duration of the request, but also the closure that it references. After the response comes in, it's also keeping the possibly-large response data alive as well. This, ladies and gentlemen, is the dreaded XHR leak. It's now completely up to the browser to decide when to free these objects. Older versions of Internet Explorer, all the way up through IE7, appear to keep these references around much, much longer than necessary, possibly forever!

Experimenting with XHRs

At the time of this writing, the specialness of the XHR object can be demonstrated by the current browsers. I'm going to use Chrome/Chromium here since it's got the best tools for observing the internals.

function makeMany(type) {
    for (var i = 0; i < 1000000; i++) {
        new type();
    }
}

The function makeMany() creates a million objects of a given type, but retains no reference to any of them. In theory, they're free for garbage collection as soon as they're created.

function Point(x, y, z) {
    this.x = x;
    this.y = y;
    this.z = z;
}

According to Chrome's heap profiler, an XHR instance is 24 bytes and instances of this Point prototype are each 56 bytes. When I ask makeMany() to generate a million Point objects, the browser does it trivially.

makeMany(Point);
// Chrome easily asks, "Do you even lift?"

Let's try the same thing with XHR, which according to the profiler should use even less memory. If XHR wasn't a special object, the result would be the same. Note that there are no closures or circular references created here.

makeMany(XMLHttpRequest);
// Chrome starts thrashing my craptop now

If I run this a few times in a row Chromium's memory usage blows up past a gigabyte and my whole computer starts thrashing. I begin to worry about when I last saved this blog post's buffer. If it manages to survive the thrashing, Chromium does eventually free these XHR instances. There's some logic buried in there to determine if they're safe to let go, but it doesn't make this check until very late.

To work around this, the Closure Library pools XHRs. Disciplined re-use of XHR objects can free the developer from relying on the browser to dispose of old XHR objects. Browsers limit the number of simultaneous HTTP requests to somewhere between 2 and 5, so XHR pools should typically stay very small.

Conclusion

I hope this clears up some of the confusion on this subject. I'd like to learn a lot more about what's going on, but this all appears to only be documented as source code (if that), which is a slow way to absorb this sort of information. If you know of any informed articles or documentation on the subject, please share!

tags: [ javascript ]
blog comments powered by Disqus
Fork me on GitHub