Using D3 with native HTML: DIVs as datavis

May 20, 2013

Last week, the White House released 100 pages of printed emails documenting the intelligence community's public response to the Sept. 11, 2012 attacks on the American diplomatic compound in Libya. There are 91 unique messages in the documents and a high level of redundancy due to long reply-chains being printed multiple times.

At Yahoo News, we decided to arrange this information as an interactive inbox, in which readers could view the messages in a basic approximation of a standard email client.

The emails arrived to us as paper printouts, with many identities of CIA and State Department officials redacted, so the first task was to translate them into machine-readable text. On short notice, the easiest way to do this was to store the metadata about each message—to, from, cc, subject, date—in one JSON object and to store the content of the messages in individual text files. I used Pagedown, the Javascript markdown processer, to convert the raw text of the messages into spartan HTML.

The interactive inbox then became a straightforward matter of displaying the JSON file as a table element with a row for each message and making an AJAX call to the text file containing the body of the email when the user clicked on it, mimicking the functionality of an Outlook preview pane. (Very short messages were stored directly in the JSON file.) I wouldn't do it this way again, but such is the nature of deadline-driven development.

This means that when a user clicks on the table row representing a given message as a line in an inbox, the code has to somehow access the original data object used to create that DOM element. It then populates the preview pane with the To and Cc fields and makes an AJAX call to the a text file whose name is also stored in the object.

Until recently, my solution would have been to assign each row of this table a unique id, store the metadata about the messages in a dictionary-like object with those ids as the keys, and use that id to get the object back. There is nothing terribly wrong with this strategy, but it's tedious and prone to error. After a few false starts, I realized that this issue of projecting data onto the DOM is quite literally the philosophy behind Mike Bostock's D3 library.

Nearly every example of D3 in action on Mike's Github page uses the library's abstraction of Scalable Vector Graphics (SVG) to visualize a dataset. This is the intended use of D3, I think. But it is just as valid to use for building traditional elements. Here's a simplified demo:

<table id="inbox" class="inbox">
    <thead>
        <tr class="field">
            <td>From</td>
            <td>Subject</td>
            <td>Date</td>
        </tr>
    </thead>
    <tbody id="messages"></tbody>              
</table>

<script>
    var row = d3.select("#messages").selectAll(".message")
        .data(messages)
        .enter()
        .append("tr")
        .attr("id", function(d) { return d.id; })
        .attr("class", "message")
        .on("click", function(d) {
            // we now have access to all the properties of the individual message
        });
    row.append("td").text(function(d) { return d.From; });
    row.append("td").append("div").text(function(d) { return d.Subject; })
    row.append("td").text(function(d) { return d.Date + " " + d.Time; });
</script>

It is not quite sufficient to point out that D3 is fully capable of creating and manipulating tables, divs, spans and paragraph tags. We ought to recognize that even the simplest markup is a fully qualified data visualization, working off this horseback definition:

Data visualization is a process whereby some portion of an information set is encoded by position, color, or another graphical property

It naturally follows that text is data, just like integers and RGB values. By this definition, the browser itself is the greatest engine of data visualization ever invented. And it won that distinction long before it was capable of drawing shapes on a screen.