Data preparation is a very important step in any data analysis/science project, it enables us to do proper analysis and visualization in order to come up with answers to questions that we propose. In this post, I’ll show you how I cleaned and enriched a dataset for NYC High School Data. The datasets used can be obtained from city of New York online portal.
The focus of the analysis and exploration will be around SAT Scores. We’ll dive into reasons why certain schools have better scores than others, find correlations and geo-plot our findings. I’ve divided my exercise into two parts:
I’ve used python to get the job done and utilized pandas, numpy, matplotlib, re and basemap modules.
To make this more practical and realistic, I’ve edited my code in Jupyter notebooks and uploaded them into my github repository. You’ll be able to download the datasets from there as well, if you wish to practice it. To jump into the notebooks, click on the preceding links above.
NOTE: github won’t display pretty page output if you’re on mobile. Please use either tablet/computer, or request “desktop” version on your mobile device.