Analysis with Python: Exploring Gun Deaths in the US

In this post, I’ll perform some analysis by using native python functions along with some of the modules that you, as data analyst/scientist, would be working with extensively. In real life, you’ll be working with more advanced modules. Advance in terms of richness and easiness. However, I personally believe it’s important to learn how to […]

Visualizing with Python: Earnings Based On College Majors

In this post, I’m going to show you a python script that visualizes dataset in order to answer specific questions. Something that you’d do as a data analyst or data scientist very often. While this is not an advanced topic, it should give you an idea of how to get things done using Python. NOTE: […]

Drilling into Data with Oracle Data Integrator

Apache Drill is “an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets”. Think of it as the one engine for of all that is relational and non-relational, almost. Drill can be considered as part of the “serving layer” in lambda architecture. It enables you to query data, using a highly sophisticated distributed engine that runs […]

Reverse Engineer MapR-DB with ODI

This is going to be a short write-up, a bonus to my previous post “Oracle Data Integrator & MapR Converged Data Platform: CHECK!“. MapR-DB client APIs can access both HBase tables and MapR-DB tables, it all depends on what you pass to its methods. So in case you need to reverse engineer your MapR-DB tables, […]

Hive, Partitions and Oracle Data Integrator

If you using Oracle Data Integrator (ODI) to load a set of results into a table with partitions and unable to, you’re in the right place. Partitions are good and needed, no need to talk about their benefits here. What I’m going to focus on is how to let ODI use them with a “dirty” […]

Oracle Data Integrator & MapR Converged Data Platform: CHECK!

MapR has their own Hadoop-derived software, a distribution that claims “to provide full data protection, no single points of failure, improved performance, and dramatic ease of use advantages”. For instance, MapR doesn’t rely on regular HDFS we’re all used to, but came up with MapR-FS, which works differently and provides substantial advantages over regular HDFS, […]