jokerconf Joker 2016 (14.10.2016 — 15.10.2016)

How to calculate the CTR of 100M objects in real-time and not to die

img

The talk is about the implementation of CTR estimation for news feed at OK.ru, which technologies were used and how they had to be adjusted, as well as about the abilities of streaming data analysis platforms in general.

Estimating CTR does not seem a difficult task — even if you have terabytes of logs with the impressions and clicks, a small Hadoop cluster cope with them without any problems. If the amount of objects is not so large, for example, hundreds of thousands of advertising companies, it is feasible to estimate CTR in real time. But the situation looks differently if you need to estimate CTR in real-time for hundreds of millions of objects, processing millions of events per second, and then to use the results for the evaluation of tens of millions of candidates every second.

Dmitriy will talk about the implementation of CTR estimation for news feed at OK.ru, which technologies were used as a basis and how they had to be adjusted. He will also discuss the abilities of streaming data analysis platforms in general, not limited to the CTR estimation.