Bargaining with Stack Overflow and other stages of developer grief

February 21, 2015 |

A few weeks ago, we published a little app on that uses unit-level Census data to show you how many single people in your city meet your dating standards. Using about 15 million records from the IPUMS project, we asked the user for his or her preference in education, income and other demographic questions and estimated what percent of the population matched those criteria.

As a kicker, the app searched the user's preferences across all cities and came up with a recommendation as to which places were the richest in acceptable mates. This was a major challenge since, as always, I prefer to work with flat files instead of live databases whenever possible. But I wasn't about to make 200 AJAX calls to the data files for each city, and it wasn't practical to precompute the top cities for every possible response when the permutations ran to the millions.

At one point, hours before this was scheduled to publish, I considered trying to load the entire dataset in the browser. I got so far as asking for permission to do this from Stack Overflow:

I'm building a quiz backed by a lot of Census data, and I would really like to power it with flat files rather than worry about a database and servers. (More on the case for flat files here.) The uncompressed JSON file I need for the file part of the quiz is currently 11MB.

Right now, I'm making an AJAX request for the file right away, but not using it until the user has completed the quiz, which will take about 1 minute.

I realize this is a bit of a subjective question, but I'm wondering how much this is pushing the envelope when it comes to supporting a wide variety of modern phones and computers in 2015. I'm worried less about the bandwidth than about the memory and processing power in the device. The code parses through all the data, matches entries to user responses, and computes a result.


Is it crazy to AJAX an 11B JSON file if I don't need it for about 60 seconds? (I'll check to make sure I have it when the time comes, of course)

The solution was to load just the pieces of the full dataset we needed based on the user's responses and compute the optimal cities on the client. I considered deleting this question out of sheer embarrassment, but thought I would leave it up as an artifact of the sort of desperate measures we all take when the clock is ticking.