February 26, 2020

Scaling ETL with Scala

When I joined Protenus in 2015, the first version of our ETL “pipeline” was a set of HiveQL scripts executed manually one after another. The company, still a start-up focused on proving out the analytics and UX, had adopted Spark, Hive, and MongoDB as core technologies. With our Series A funding round completed, my first task was to take these scripts and build out an ETL application. The first attempt naturally adopted Spark and Hive as primary technologies and added state management. This version got us through our next few clients.

Continue Reading