jokerconf Joker 2018 (19.10.2018)

How to tune Spark performance for ML needs

img

Artem will tell about a set of methods tried on a "live" project, which helped make execution time of some jobs 5-20 times better. The talk’s aimed at engineers working with big data and especially with Spark.

When speaking about machine learning on large data volumes, Apache Spark is a popular solution. While coding on Spark is pretty easy, to make performance of your application higher you need to understand not only Spark internals, but also what data and in what volumes you are dealing with. Artem will tell about a set of methods tried on a "live" project, which helped make execution time of some jobs 5-20 times better.