Testing Alternative C Memory Allocators Pt 2: The MUSL mystery

Testing Alternative C Memory Allocators Pt 2: The MUSL mystery

Emerson Gomes

Original https://www.linkedin.com/pulse/testing-alternative-c-memory-allocators-pt-2-musl-mystery-gomes

A few months ago I have wrote an article comparing the performance from different memory allocators on Linux.

However, one popular component missing from my previous testing was musl, the libc used as a replacement of Glibc in some distros, more notably in the Alpine Linux, the rock star distro for the container world, due to its tiny size resulting images.

I mean, what is there NOT to love about musl? Small, clean, and (supposedly) super-efficient, drop-in replacement of the granny Glibc. Everyone tends to fall in love with it when reading this comparison table. Including me.

Well. Turns out that there are some troubles in paradise.

I started noticing that some of my applications were actually running slower than expected under alpine, and sometimes by a 10-20 times factor. Also, googling around I could find some similar complaints, like thisthis and this.

Ok, it was time to perform an actual benchmark to understand what was going wrong.

Using a very similar setup to the one I used in the first test, I was shocked to see how bad things went when moving an apache httpd instance from Glibc to musl: slowdowns above a 10 times factor!

musl performing about 15x slower than glibc!


The first reaction is: "Ok, there's something very wrong here".

Trying to understand what was going on, I collected a strace summary of both scenarios.

Não foi fornecido texto alternativo para esta imagem


The first eye-popping thing was the massive amount of futex waits on musl (right) compared to Glibc (left), which clearly signs towards contention issues. But this still doesn't tell where they are happening.

One of the next surprising discoveries is that musl is kinda a one-man-project. The main (almost only) contributor to the project clearly acknowledges issues with musl memory allocator. And he's currently working a complete re-design of it, called malloc-ng - which I decided to give a try:

Não foi fornecido texto alternativo para esta imagem



Well. Respecting the fact that malloc-ng is still being developed, things improved from original malloc, but... Still very, very far from Glibc results.

My next attempts were to replace musl with one of the mallocs I tested earlier failed. jemalloc would strangely segfault (no wonder Alpine removed such packages from their repos), while tcmalloc wouldn't even build.

Finally hope finally came with Microsoft's mimalloc. Not only it builds/runs perfectly with musl, it actually pushed musl performance above that of glibc's one.

Here are the final (amazing!) results:

Não foi fornecido texto alternativo para esta imagem


To validate the hypothesis and results, I have also performed a completely different test using cpuminer-opt and the m7m algo:

Não foi fornecido texto alternativo para esta imagem


While performance at single thread was pretty much the same, with an increased number of threads things are very different!

TL;DR Conclusion: Musl has huge issues with multithreading. Either move to some Glibc based distro (do you have a minute to listen to the word of clearlinux?) or resort to mimalloc as an alternative memory allocator.

Also, seems Microsoft has really nailed with mimalloc, which has an innovative malloc design not seen in any of the competitors - Have constantly performing better in a wide range of scenarios.

Glibc + mimalloc showed mixed results and require more investigation.

Since Alpine lacks packages for mimalloc, I have prepared a docker repo with Alpine preloaded with mimalloc in case you want to try it right away!

docker pull emerzon/alpine-mimalloc

That's all for today!



Report Page