Guessing the beat from user-entered musical intervals

May 5, 2013

As I mentioned a few days ago, I'm a big fan of Soundslice, a project by Adrian Holovaty and PJ Macklin for creating interactive guitar tabs. One of the coolest features is the ability to tap out measures on the computer keyboard as a recording of the sound you're transcribing plays, thus marking out measures to then fill in with chords.

Right now, these tapped-out measures appear precisely as entered by the user. I suggested to the Soundslice team that they be normalized in some way, since even the most precise human cannot strike a key on the keyboard at exactly the same interval of milliseconds every time. Holovaty politely suggested I figure it out myself and get back to him.

The raw input for this sort of problem is a simple of array of timestamps created each time the user strikes a key during a recording session. The goal is to come back with a single optimized interval, in milliseconds, that represents the ideal space between measures.

The most obvious approach is to average the intervals between each timestamps and call it a day. I don't like it for two reasons:

We don't want to assume the human error is uniformly distributed on either side of the ideal value; and
When possible, I enjoy making things more difficult than necessary.

What we're really dealing with here is a simple linear regression. We have a bunch of points on a graph, the Nth beat on the x-axis, the timestamp on the y, and we need to fit a line to that data. As such, I thought a simple linear regression might do the trick. Rather than averaging the intervals, it finds the interval that minimizes the square of the error between each optimized interval and the one entered by the user.

To test the theory, I made a little Javascript metronome and recreated the problem:

After hitting start, press 'T' or 'Y' with each click of the metronome. Feel free to miss a beat, we'll take care of you.

tempo: ticks:

To calculate our guess, I'm passing the user-generated timestamps as an array named stamps to a function called correct() (stored in the guess.js file if you're looking at the source). A few things are going on here. The integers in the stamps represent the milliseconds that have elapsed since "Start" was pressed.

First, we try to detect when the user missed a beat by looking for unusually long intervals:

var avg = Math.round(stamps[stamps.length - 1] / stamps.length);
// if we discover unusually long beats, they must
// surpass this ratio to be considered a double beat
var MISSED_BEAT_DETECTOR = 1.5;
var c = 1; 
while (c < stamps.length) {
    var interval = stamps[c] - stamps[c - 1];
    if (interval / avg > MISSED_BEAT_DETECTOR) {
        console.log("Detected a missed beat at position " + c);
        stamps.splice(c, 0, stamps[c - 1] + interval / 2);
        avg = Math.round(stamps[stamps.length - 1] / stamps.length);
    } else {
        c += 1;
    }
}

At this point, avg is the average interval after correcting for missed beats. In many cases, this gets us very close to the actual value of the metronome. (Remember, though, that the blue "perfect" interval marked out in the top bar is purely theoretical. In the wild, we have no idea what the correct answer is.)

After correcting for any missed beats, we can take both the average and the linear regression. The latter is currently done in-house, but I'm sure there's a plugin for this.

Here's what's interesting: At even moderate tempos, the average appears to often be better than the minimum square error. But at tempos below about 75 beats/second, the linear regression appears to often be right on the money.

This is the more relevant case to Soundslice, I think, because marking out space for chords or measures involves longer rather than shorter intervals. I'm basing this only on my personal user testing. It's an open question.

Got ideas? I'm still working on a comments section here, but you know where to find me.