Hacker News Data Analysis using Python

Scroll this

This is going to be a short and quick one before the weekend. I’ll be working with a dataset that has submissions to Hacker News from 2006 to 2015. Hacker News is a site where “users can submit articles from across the internet (usually about technology and startups), and others can “upvote” the articles, signifying that they like them. The more upvotes a submission gets, the more popular it was in the community. Popular articles get to the “front page” of Hacker News, where they’re more likely to be seen by others.”

In this analysis, I’ve made use of the following modules:

  1. Pandas
  2. Collections
  3. tldextract
  4. dateutil
  5. matplotlib

Using the above modules saved me a lot time, and will save you too! Some of the interesting findings were:

  • blogspot.com, github.com and techcrunch.com were among the top submitted domains.
  • Best title length ranged between 33-50 characters
  • Best submission hour mean was 14:00 and day was Tuesday
  • Total number of upvotes trend has been increasing steadily over the years

To make this more practical and realistic, I’ve edited my code in Jupyter notebook and uploaded it into my github repository. You may find the dataset used there as well. To access it, click here.

NOTE: github won’t display pretty page output if you’re on mobile. Please use either tablet/computer, or request “desktop” version on your mobile device.