Friday, November 16, 2018

The performance impact of zeroing raw memory

When you create a new variable (in C, C++ and other languages) or allocate a block of memory the value is undefined. That is, whatever bit pattern happened to be in the raw memory location at the time. This is faster than initialising all memory (which languages such as Java do) but it is also unsafe and can lead to bugs, such as use-after-free issues.

There have been several attempts to change this behaviour and require that compilers would initialize all memory to a known value, usually zero. This is always rejected with a statement like "that would cause a performance degradation fo unknown size" and the issue is dropped. This is not very scientific so let's see if we could get at least some sort of a measurement for this.

The method

The overhead for uninitialized variables is actually fairly difficult to measure. Compilers don't provide a flag to initialize all variables to zero. Thus measuring this would require compiler hacking, which is a ton of work. An alternative would be to write a clang-tidy plugin and add a default initialization to zero for all variables that don't have a initialization clause already. This is also fairly involved, so let's not do this.

The impact of dynamic memory turns out to be fairly straightforward to measure. All we need to do is to build a shared library with custom overrides for malloc, free and memalign, and LD_PRELOAD it to any process we want to measure. The sample code can be found in this Github repo.

Measurements

We did two measurements. The first one was running Python's pystone benchmark. There was no noticeable difference between zero initialization and no initialization.

The second measurement consisted of compiling a simple C++ iostream helloworld application with optimizations enabled. The results for this experiment were a lot more interesting. Zeroing all memory on malloc made the program 2% slower. Zeroing the memory on both allocation and free (to catch use-after-free bugs) made the program 3.6% slower.

A memory zeroing implementation inside malloc would probably have a smaller overhead, because there are cases where you don't need to explicitly overwrite the memory, for example when the allocation is done behind the scenes via mmap/munmap.

No comments:

Post a Comment