Today, I graduated from Dartmouth College.

Four years ago, Conan O'Brien gave a wildly-acclaimed, viral speech at Dartmouth. Graduating high school at the time, I could feel why.

It was just so magical, so perfect. I laughed through all the funny bits, which oddly enough, felt like the whole speech. I connected with all the inside jokes about an institution that I hadn't even attended. Yet amidst it all, I was left with a deeply emotional and real message of "embrace failure" that did not feel like a artificially-injected, out-of-place cliché.

Thought it felt right, I tend to find it difficult to leave it at that. There is a whole movement to quantify everything as much as possible, and I was tempted to figure out the anatomy of a good speech. More importantly, I wanted to figure out what makes a speech feel so magical.

Quantifying the commencement speech

Inspired by an article about how former State of the Union addresses have become dumber over time, I wanted to track on an individual level the writing ability of some of the greatest commencement speakers in the decade.

Many of these speeches have the same goosebump-inducing factor as Conan's speech. As a point of curiosity, I wondered if there were large differences in linguistic sophistication amongst the writers.

How did Conan rank? Out of everyone, who gave the most linguistically complex speech? Who gave the "dumbest"?

I wanted to know.

Gathering the corpus

I grabbed the text of over 100 of some of the best-written commencement speeches, including, as proclaimed by the website:

...some of the most profound, inspirational and entertaining speeches, presented by the world’s leading individuals.

The corpus was chosen for its accessibility, quality, and relative consistency among documents. Some of the classic speeches, including Bill Gates' and Steve Jobs' are in there, as well as speeches from JK Rowling to Bono. I spent a good five minutes or so trying to come up with a speech not in the corpus, and failed. So I assumed it was good enough.

Using the following Python code, I grabbed and serialized all of the speeches on the site:

import urllib2  
import re  
import pickle

rootpage = urllib2.urlopen("http://gradspeeches.com/").read()  
links = re.findall('href="(.*?)">Watch', rootpage)  
speakers= re.findall('<td style="width: 25%; padding-top: 14px;">(.*?)</td>', rootpage)

documents = dict()

for link, speaker in zip(links, speakers):  
    try:
        print speaker, link
        lines = urllib2.urlopen(link).readlines()
        text = ""
        for line in lines:
            match = re.match("<p>(.*?)</p>", line)
            if match:
                text += match.group(1) + " "
            documents[speaker] = text
        except:
            print 'unable to do', speaker

pickle.dump(documents, open("raw_corpus.pkl", "wb"))  

The Flesch-Kincaid Score

The next step involves calculating how lexically sophisticated these speeches are, from a K-12 point of view.

There are many ways to accomplish this goal; in the interest of time and ease of computation, I chose two of the Flesch-Kincaid readability tests. These tests are designed to compute, with minimal effort, how difficult an English passage is to understand.

The formula involves crunching the total number of sentences, words, and syllables in a document in the following manner:

Essentially, the longer your sentences and the bigger your words, the more "lexically sophisticated" you are. The following table represents what the Flesch-Kincaid score represents:

A Dr. Seuss book would be generally over 100, and the typical article in The Harvard Law Review would be in the 20's to 30's range.

For simplicity, the above formula can be converted into a K-12 scale, through the Flesch-Kincaid grade level scale:

Thanks to Python's Natural Language Toolkit and eady.in's list of heuristics on how to estimate the number of syllables, I wrote a short script to calculate Flesch-Kincaid on all of the speeches in our corpus as follows:

import pickle  
import re  
import nltk

corpus = pickle.load(open("raw_corpus.pkl"))  
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')  
for speaker, text in corpus.items():  
    stripped_text = re.sub("[^\.\(\) A-Za-z0-9!,\?]", " ", text)
    sentences = tokenizer.tokenize(stripped_text)
    words = re.sub("[^A-Za-z0-9 ]", "", stripped_text).split()
    total_syllables = sum(map(sylco, words))
    total_words = len(words)
    total_sentences = len(sentences)
    fk_score = 206.835 - (1.015 * total_words/total_sentences) - (84.6 * total_syllables/total_words)
    fk_grade = (0.38 * total_words/total_sentences) + (11.8 * total_syllables/total_words) - 15.59
    print total_syllables, total_words, total_sentences, speaker, fk_score, fk_grade

How do past commencement speakers size up?

Here are the results, sorted by ascending grade level. Click on the individual columns to sort by a specific feature in the data.

Stay Hungry, Stay Foolish

The Flesch-Kincaid score is hardly perfect. In quantifying a speech, it is but a single feature in what could be a career-long hunt for discovering what makes for the perfect speech. The Economist still notes the viral Flesch-Kincaid analysis on State of the Union addresses as naïve, both here and here.

Am I bothered by the fact that Conan's speech is at a seventh grade level, and so is Steve Job's? Not in the slightest. If anything, having a more easily accessible and less professorial speech can be seen as a technical feature, and not a bug.

I think it's important to realize that a single quantitative measurement cannot measure the intelligence of a speech, let alone the speechwriter. To believe otherwise would be to cede that Abraham Lincoln delivered the dumbest State of the Union until 1913, or that a student who uses endless run-on sentences is the most intelligent among us all.

And so the trek continues, in the search for a quantitative way of measuring how a speech feels good.

On second thought, I'll put it off until graduate school.

Addendum: What is David Brooks' Score?

I recorded Dartmouth's 2015 commencement speech, transcribed it with a free speech-to-text service, and (almost) certainly became the first person ever to calculate its Flesch-Kincaid score.

Here we go:

Syllables: 4824
Words: 3213
Sentences: 117
Flesch-Kincaid Score: 51.9430510666
Flesch-Kincaid Grade Level: 12.561911226