Using slab allocators in high performance network applications
Nikolay will tell how he and his team solved the problem of memory fragmentation and what limitation they had to face in the process.
Nikolay works for the company that protects the business from various threats on the Internet. This means parsing, DDoS attacks, click frauds and much more. Such solutions are high performance network applications that can pass large traffic volume through itself per unit time. Despite all the disadvantages that are typical for C++ in the enterprise perhaps modern C++ is the most suitable programming language for such tasks. Not only because he can provide suitable performance and is itself a powerful tool, but also because with new standards and such frameworks as boost the learning curve is not as high as it might seem first.
A lot of network applications for work with HTTP traffic have classic architecture. There you meet a server, client, sessions, pools, parsers, and serializers. It’s a common set of classes that is very convenient to develop on boost.asio and boost.beast today. But if we’re talking about a serious traffic volume examples that each of us copies from boost.asio documentation and develops further for tasks may not be as good as it might seem. They require careful improvement in a serious product.
Inside of Nikolay’s team, a long way has been done to develop similar code. They supported many features from rfc for http 1.1. They implemented their lines and stream buffers compatible with boost.asio.streambuf which allowed minimizing memory copying between server and client sessions in the moment of traffic proxying. As a result, we got a network application that works well in production and can withstand heavy loads and solve the business task.
But nothing is perfect. A problem which Nikolay and his team encountered at a late stage is memory fragmentation caused by the constant creation of server sessions to handle incoming connections. Problem that could not be solved with help of tcmalloc and jemalloc was solved due to slab allocators (C++ covering over the small framework from Tarantool) which reduced the time application spent in malloc and new by more than two times and saved us from memory fragmentation. This decision imposed certain restrictions on the application architecture adding its difficulties in writing asynchronous and multi-threaded code on boost.asio, but it gave the expected result.
In this talk will be discussed how Nikolay and his team overcame the difficulties. Speaker will also mention the application architecture which required such difficult optimizations and proceeds to slab allocators and boost.asio. Nikolay expects that this topic may be useful for developers of high performance network applications.