Playlist Classification on Spotify using KNN and Naive Bayes Classifier

Nev Acar
Towards Data Science
8 min readJan 23, 2019

--

by Spencer Imbrock on Unsplash.com

One day, I thought it would be cool if Spotify helped me pick a playlist when I like a song. The idea is to touch on the plus button when my phone is locked and Spotify add it into one of my playlists rather than library so that I don’t go into the app and try to pick a playlist that it suits well. This way, I wouldn’t have to choose a playlist among all of my playlists and I would just leave it to the algorithms. Then, I realized it makes a good side project for a machine learning enthusiast. After all, I started this project to avoid unlocking my phone and think for three seconds which is not the most optimized solution for me as an individual.

You can find the Jupyter Notebook on https://github.com/n0acar/spotify-playlist-selection

1- Scraping Data from Spotify Web API

This is actually where everything starts. I found Spotify Web API and Spotipy framework. By the combination of both, we will be able to extract official Spotify data with useful features.

Before sending a request to Spotify Web API, you need to install and import dependencies below.

import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

In order to get access to API, you will need special codes created only for your application. Go to https://developer.spotify.com/dashboard/ and click “Create a Client ID” or “Create an App” to get your “Client ID” and “Client Secret”. After that, Redirect URI must be changed to any page you decide on in the settings of your Spotify application.

client_id= "YOUR_CLIENT_ID"
client_secret= "YOUR_CLIENT_SECRET"
redirect_uri='http://google.com/'

Next, state your scope from Spotipy documentation and “sp” will be your access key to Spotify data, from now on.

username='n.acar'
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
scope = 'user-library-read playlist-read-private'
try:
token = util.prompt_for_user_token(username, scope,client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri)
sp=spotipy.Spotify(auth= token)
except:
print('Token is not accesible for ' + username)

Then, you can do a lot with that sp variable. It is your Spotify object. For example, you can extract your song library, take the data of a playlist or better, all playlists of a user.

songLibrary = sp.current_user_saved_tracks()playlist = sp.user_playlist(username, playlist_id='6TXpoloL4A7u7kgqqZk6Lb')playlists = sp.user_playlists(username)

To find ID numbers of playlists or tracks, basically use sharing link and get the code out of it. Everything the API provides come in JSON format and you can basically extract the information you need out of that.

For various usage of Spotify data, you can go through the documentations of Spotipy and Spotify Web API. There are plenty of methods and data types that might be good use for a project.

2- Visualization of Features and Insight

When human beings listen to a song, they perceive several aspects of it. A person probably does not have trouble determining whether a track is acoustic or electronic. Most of the people having some kind of music sense can say a lot about music that can be hard to get for a computer. However, Spotify API provides some useful features about any track on the platform. Most of them are normalized between 0 and 1 so they are pretty easy to tackle.

Below is a comparison of two Spotify playlists called “Classical Essentials” represented by blue and “Get Turnt” (a rap playlist by Spotify) represented by red.

Get Turnt (RED) — Classical Essentials (BLUE)

From these plots, you can see that these two genres show distinct patterns by the nature of the music. While the rap playlist has higher values in terms of danceability, energy, loudness, speechiness, valence and tempo, the classical music playlist have higher acousticness and instrumentalness values as expected.

3- Applying Machine Learning Algorithms

I used two algorithms which are K-Nearest Neighbors and Naive Bayes Classification. Actually, I only wanted to implement a couple of basic algorithm from scratch and I realized those are the most convenient ones to start with.

K-Nearest Neighbors Classification: K -Nearest Neighbor (KNN) algorithm is a classification technique that utilizes the feature similarity between existing and new data based on the notion of distance.

KNN

Naive Bayes Classification: Naive Bayes Classification is a method which calculates the probability of every feature as if they are independent and determines the outcome with the highest probability based on Baye’s Theorem.

Naive Bayes Classification

The statistical distinction between the two is pretty obvious. I excluded tempo and loudness from my feature list since they didn’t contain much information about the genre and very scattered. I used first 35 songs of the playlists for training and rest of the songs for testing. The testing set might not be the most objective one out of the subjective nature of music genres. Since both playlists are created by Spotify, I thought it’s the safest way to test the algorithms. Luckily, our target is to classify rap and classical music, the statistics don’t make any mistakes and we get 100% accuracy rate.

Even if all songs turned out to be in their right category, one classical song was almost going to the wrong bucket upon using KNN. It is called Violin Concerto BWV 1042 in E Major: I. Allegro. It wasn’t even close to a rap song when using Naive Bayes Classification, yet it is still the closest thing to a rap in classical music playlist. At first, I tried to find the closest rap songs in terms of distance which is Crushed Up by Future so it was a bit surprising. Then, I checked the features of this classical song which actually looks a bit faulty to me. (Give it a listen! At least the “instrumentalness” feature should be greater than zero.)

Nevertheless, this wasn’t a hard task. To make things more complicated and evaluate the algorithms better, I decided to add one more playlist which is “Rock Save the Queen”.

Accuracy Rates (%)

NBC looks more consistent than KNN. That classical song mentioned above is not in classical music playlist anymore because of newcoming rock songs. Still, NBC doesn’t make any mistake. However, in Get Turnt playlist, things are a little bit different. Deciding the genre based on only closest neighbors is not a good idea for rap songs this time. The most confused rap song is “EA” by “Wifisfuneral” and “Robb Bank$” and it ended up in rock bucket.

To make things more challenging I added a new playlist, “Coffee Table Jazz”, created by again Spotify.

Before showing you the results, I’d like to add updated visualization of the four playlists the algorithms use in the final model. I will refer to these at the final step when I investigate the results. (Rock Save the Queen and Coffee Table Jazz are represented by yellow and orange, respectively.)

Get Turnt (RED) — Classical Essentials (BLUE) — Rock Save the Queen (YELLOW) — Coffee Table Jazz (ORANGE)
Accuracy Rates (%)

From these, we clearly see that classical music and jazz music genres are statistically close to each other while the same is true for rock and rap genres. This is why when rock music is introduced, the newcoming songs didn’t cause any confusion of classical songs but damaged the accuracy rate of Get Turnt. The same case occurred when Coffee Table Jazz was introduced to the system. This time, classical music classification by KNN is severely hurt. Knowing that classical music is only confused with jazz, 51% accuracy rate is almost random. This shows that considering only the closest neighbors don’t work to classify the genre. At least, it is safe to say NBC is much better.

NBC couldn’t work for classical and jazz music as well as it works for rap and rock. I think this is the case because the way people distinguish classical songs from jazz is classifying the different instruments used traditionally. Even if there is a bit of difference between them in terms of danceability, that is not enough for some of the songs. The other features are almost the same which is expected. Instrumentalness; however, is almost exactly same as it can be seen on the spider graph. Yes, they are obviously both instrumental, yet this is where this feature comes short of human sense. Maybe, Spotify providing another data type of which instruments or types of instruments are specifically used in a song might resolve this problem as well. I think 98.44% accuracy rate is great for the other two playlist. NBC only missed one of the songs from each playlist. They are STARGAZING by Travis Scott which is identified as rock and Distant Past by Everything Everything as rap. (If you listen to them, please share your comments.)

4- Conclusion

The aim of this project is to create undetermined genres of given playlists and classify the new songs accordingly. It is supposed to work for any playlist that has the same type of songs in it. I opt to use four clearly genre-labeled playlists created by Spotify. I used two different algorithms which are KNN and NBC as mentioned in the post. NBC outperformed KNN overall. The challenges this system might encounter is to distinguish two different playlist with very close genres. I also realized that features that Spotify provides might not be enough for some cases.

Don’t hesitate to ask questions and give feedback. Music, genres, and overall this work are subjective. I appreciate any comment on the results since the topic is open for discussion.

Cheers,

Nev

Give me a follow for more content and support!

Connect with me on Twitter and LinkedIn

--

--

Currently helping a niche startup podcast being built from the ground up and writing about it