Try tapping your desk three times a second. Got it? Great.

Now, try tapping your desk four times a second with your other hand. Simple right?

Now do both at the same time. Not so easy.

When I was in middle school, I started playing Chopin's Nouvelle √Čtude in F minor:

In technical music-speak, this is called a 4:3 polyrhythm, which is defined as:

...the simultaneous use of two or more conflicting rhythms, that are not readily perceived as deriving from one another, or as simple manifestations of the same meter.

So after a week of going at it by myself with little success, I was given a (pretty standard, I'll admit) mnemonic.

Pass the Salt and Pepper.

So if you're tapping four times a second with your left hand and three times a second with your right, it should look something like this:

where the x's denote taps and the -'s denote rests. With the mnemonic:

  • On the "Pass", both hands tap together.
  • On the "the", only the left hand taps
  • On the "Salt", only the right hand taps
  • On the "and", only the left hand taps
  • On the "Pep", only the right hand taps
  • On the "per", only the left hand taps

So, in table format:

And so, I took this mnemonic to heart as a middle schooler.

But I had so many questions.

Doesn't everybody say the words "Pass the Salt and Pepper" differently, even if ever so slightly? Is is meant to be an approximation, or is it exact?

So I put it to the test.

Of course, I'm replicating the work of the past, since that code from about a decade ago is almost certainly lost, but it went something like this.

1. Getting an audio standard

Here, I used, AT&T's TTS feature. I typed in the words "Pass the Salt and Pepper" and downloaded the .wav file.

2. Data processing

I used scipy's built-in wav file reader. Here's a quick and dirty code snippet of the logic:

from scipy.io import wavfile  
STEP_SIZE = 500;  
THRESHOLD = 2000000;

#Read in audio file
audiofile = wavfile.read("pepper.wav")  
audiodata = list(audiofile[1])

#Take absolute values of all amplitudes
audiodata = map(lambda s: abs(s), audiodata)

#Bin data
buckets = [sum(audiodata[i:i+STEP_SIZE]) for i in xrange(0, len(audiodata),     STEP_SIZE)]

#Final Local Maxima of Binned Data
local_max = [1 if buckets[i-1] < buckets[i] and buckets[i+1] < buckets[i] else 0 for i in xrange(1, len(buckets) - 1)]  
local_max = [0] + local_max + [0]

#Mark out bins that are both high amplitude and local maximum
res = [1 if local_max[i] == 1 and buckets[i] > THRESHOLD else 0 for i xrange(len(buckets))]

print res  

Essentially, we are looking for points of strong emphasis in the audio file, defined by two measures (by how loud each point is, and how relatively quiet the time around each point is). Hopefully, these will align with the syllables of our favorite mnemonic.

3. Data interpretation

After running the Python script above, we get the following output:

[hchen@tahoe wav_read]$ python wave_read.py
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

If we get rid of the starting and trailing silence, it becomes:

[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]

Good news. We see that there are six points of emphasis in the output, and there are six syllables in "Pass the Salt and Pepper".

But do they line up?

10000000010000000001000000000000001 (Left Hand)
10000000000000010000000000001000000 (Right Hand)

Almost. We can add some trailing zeroes to pad the end, but it is pretty clear that for the left hand, the first three beats are a bit fast, and for the right hand, the first two beats are a bit slow.

Oh yeah, the bigger picture.

Why does this all matter? Well for one, I thought about it for a bit, and this was probably one of my earliest moments where I embraced interdisciplinary thinking.

Practically though, it matters. Because, in the grand scheme of things, piano gets much harder than Chopin's Nouvelle √Čtude in F minor.

The easiest polyrhythm in Barber's 3rd Excursion is 7:8, which you'd have to break down into a 56th note subdivision to figure out precisely. If you'd tried to figure out the polyrhythms for this piece precisely, you'd go mad.

And there's plenty like it. The bigger picture, in essence, is to think about the bigger picture. Instead of being bogged down into the specific rhythms of how to tap in mathematically precise ways, you just practice it so many times until it works.

Spend an hour or so tapping three times a second with right hand. And the next tapping four times a second with your left. When you put it together, stop thinking about how the rhythms should line up.

Instead, when your brain starts to turn to mush, and your fingers start to move on their own...

It just works.

And then I realized the real point of the mnemonics. "Pass the Salt and Pepper" wasn't ever meant to be a silver bullet for mastering 4:3 polyrhythms.

No.

But it got me practicing, and that made all the difference.