## Rap for math geeks (K-nearest neighbarz)

Here are the first two stanzas:

K-nearest neighbors, surprisingly it can slay
Got these I.I.D women so I call them Naive baes
With conditional vision guessing, I guess I should listen
Their predictions more accurate than anything I could say, but wait

Equal weighter, I’m never a player hater
Cuz Naive Bayes don’t care about the size of my data, now
Take a second and pause, just so you know what I mean
Y’all tryna muster the clusters I developed with K-Means

Below the fold are the rest of the amazing lyrics for “k-nearest neighbarz” by our new favorite rapper, J-Wong:

So you know what I mean, do you know what I meant
While I’m stochastically tilting the gradient with my descent
Yes I, curved the plane, but it looks just fine
Just gotta know the logistic function I usually define

It’s sort of distance weighted, the middle is saturated
So the outliers in the data could never degrade it
I made it, simple, cutting dimensions to one
Fishing for data with Fisher’s discrimininant till I was done

I’m tryna flatten the data to make some visual sense
Go between all the classes maximizing the variance
Just wanna categorize, supervised if I can
Maximizing it relative to the variance within

But let’s all go back, for a sec in case you missed it
Take it back to a regression form that was logistic
I keep it saturated so there’s no reason to doubt, just
1 over 1 plus e to the minus alpha

I like cool people, but why should I care?
Because I’m busy tryna fit a line with the least squares
So don’t be in my face like you Urkel
With that equal covariance matrix looking like a circle

Homie wanna know, how I flow this free?
I said I estimated matrices with SVD
X to U Sigma V, and with V, just transpose it
I rank-r approximate and everyone knows it

I’m rolling in the whip, ‘cuz a brotha gotta swerve
Jay-Z’s with ROC nation while I’m on the ROC curve
True positives is good, so y’all don’t wanna stop that
I took the true negatives out and now I’m finna plot that

But what about the case, where the labels isn’t known yet
I guess I gotta analyze the principal components
So if anybody really wanna track this
Compute for greatest variance along the first principal axis

Taking losses and insults, yeah I don’t like that burn
I prefer loss functions from models up in scikit learn
And if you didn’t catch my lyrics it was right here in the notes, look
Open it in your terminal and run Jupyter Notebook

Or you can add an import line for multiprocessing pool
Then p.map it to a whole array until it returns
Pyplot your data in a graph and see what scikit learned

Got a whole data matrix and it’s N by P
If you wanna stretch or compress it well that’s fine by me, but
We’ll see, what the data can reveal real soon
Just compute the singular vectors and their values too

It’s never invertible but that’s not really the worst
I’ll pull an MTM on it and hit that pseudoinverse
Finding a line of best fit with no questions
And no stressin, estimate it with linear regression

Smack you backwards when I’m sick of looking at your eigenface
With a big empty matrix of data that wasn’t done
Got 99 columns but you’re still rank one

So, this one goes out, to all of my haters
Overfitting their models on validating their training data
Cuz my classifier smoking you leaving only the vapors
So don’t be messin’ with me, or my K nearest neighbors

(Hat tip: scott cunningham — @causalinf via Twitter)