Star Wars Analytics using Python

Yes, you have read that correctly. In his post I’m going to clean up a dataset that has been collected from 835 people; a survey that has several questions around Star Wars 1 to 6. This will enable me to answer questions like “Does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?“.

I’ve made use of several python modules in this exercise:

pandas
numpy
matplotlib.pyplot
seaborn

Throughout the exercise, I’ll make use of different techniques to clean up the columns in the survey in order to have it in a usable format. As I’ve mentioned before, I have made use of “seaborn” module, unlike my previous posts. Following that, I was able to plot some statistics and get some interesting findings, like:

San Solo is the most favourable character
Emperor Palpatine is the most unfavourable character
Revenge of the Sith (Star Wars 3) is the highest ranked

You may get the dataset from here if you wish to practice it yourself. To make this more practical and realistic, I’ve edited my code in Jupyter notebook and uploaded it into my github repository. You’ll be able to download the dataset from there as well, if you wish to practice it.

To jump into the code, analytics and visualization, click here.

NOTE: github won’t display pretty page output if you’re on mobile. Please use either tablet/computer, or request “desktop” version on your mobile device.