Video Resources

Productionizing H2O Models using Sparkling Water by Jakub Hava

Slides can be viewed here:

In this webinar, Jakub Háva, Senior Software Engineer at, will introduce the basic architecture of Sparkling Water, go over different scaling strategies and explain the structure of Sparkling Water pipelines and how they can be put into production. This talk will finish with a live demo of the pipelines in Python and pipeline deployment on a real time streaming application written in Java, giving you a real time experience of running Sparkling Water!

Sparkling Water integrates H2O, the open source distributed machine learning platform, with the capabilities of Apache Spark. It allows users to leverage H2O's machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O's Flow GUI, which makes Sparkling Water a great enterprise solution that’s accessible to a wide variety of end-users. Sparkling Water 2.2 was built to coincide with the release of Apache Spark 2.2 and introduces several new features. One of these features is the ability to use several H2O models inside PySpark pipelines. This allows the data scientist to quickly prepare and work on a pipeline which can later be easily deployed into production.

Speaker's Bio
Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. As a bachelor’s thesis, Kuba wrote a small platform for distributed computing of any types of tasks. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At, Kuba works on Sparkling Water.