Real-Time Kafka / MapR Streams Data Ingestion into HBase / MapR-DB via PySpark

Streaming data is becoming an essential part of every data integration project nowadays, if not a focus requirement, a second nature. Advantages gained from real-time data streaming are so many. To name a few: real-time analytics and decision making, better resource utilization, data pipelining, facilitation for micro-services and much more. Python has many modules out […]

Perfecting Lambda Architecture with Oracle Data Integrator (and Kafka / MapR Streams)

Republished by: MapR Technologies Datafloq ——- Introduction “Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch– and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online […]

Oracle Data Integrator & MapR Converged Data Platform: CHECK!

MapR has their own Hadoop-derived software, a distribution that claims “to provide full data protection, no single points of failure, improved performance, and dramatic ease of use advantages”. For instance, MapR doesn’t rely on regular HDFS we’re all used to, but came up with MapR-FS, which works differently and provides substantial advantages over regular HDFS, […]

MapR-FS Real-Time Transactional Data Ingestion using Oracle GoldenGate

“MapR-FS is a POSIX file system that provides distributed, reliable, high performance, scalable, and full read/write data storage for the MapR Converged Data Platform. MapR-FS supports the HDFS API, fast NFS access, access controls (MapR ACEs), and transparent data compression. MapR-FS includes enterprise-grade features such as block-level mirroring for mission-critical disaster recovery as well as load balancing, […]

Streaming Transactional Data into MapR Streams using Oracle GoldenGate for Big Data

“Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real-time, without impacting the performance of source systems. It streamlines real-time data delivery into most popular big data solutions, including Apache Hadoop, Apache HBase, Apache Hive, Apache Flume and Apache Kafka to facilitate improved insight and timely action.” MapR Streams […]