This Chart Shows Hollywood's Glaring Gender Gap
I often use this piece up as an example of the “everything is data” principle–that many things we may consider subjective, like gender dynamics in Hollywood, are perfectly quantifiable if one is willing to gather the data from scratch. In many cases, the analysis just involves counting things.
This is a nice example because the results from a simple exploration were clear and clean (if not heartening). And while I always ask permission before crawling sites like IMDB, I’ve found that almost all publishers (except social media sites) are happy to lend their content to aggregate analysis so long as they’re cited.
Figuring out how to quantify cultural objects is great fun. I’m dying to do something with classical music based on analyzing MIDIs, for example–say, plotting the number of bars in piano concertos before the piano plays its first note. Just an idea!