How to recreate the Boston Marathon from 230,000 data points

April 7, 2013 | |Source Code

In scrambling to find something to do around the bombings at the Boston Marathon, I came across a searchable database of all 26,000 participants. Fortunately, it's possible to search everyone by hitting "Enter" with no search terms.

Each runner's page lists up to 10 timestamps marking his or her progress: one every five kilometers and two more at the halfway point and the finish line. It took about an hour to scrape every page and extract this information. The result was about 15MB of data.

Even if it was realistic to load this much information in a browser all at once, the human eye cannot make sense of 26,000 simultaneous animations. (The browser can't handle it either.) So I split the race into 72 five-minute intervals and estimated, for each contestent, which kilometer marker he or she would be closest to in each interval.

For markers not divisible by five, this involved a simple linear interpolation. While a runner's pace almost always slows considerably from the start to the finish, the error in assuming constant velocity for five-kilometer spans is probably negligible.

The result, a matrix of 42 kilometers times 72 intervals, is enormously easier to handle.

I wish I had had more time to play with filtering the data by age and gender. Alas, the news moves on.