Multithreaded Memory Allocators for C/C++

I've used tcmalloc and read about Hoard. Both have similar implementations and both achieve roughly linear performance scaling with respect to the number of threads/CPUs (according to the graphs on their respective sites).

So: if performance is really that incredibly crucial, then do performance/load testing. Otherwise, just roll a dice and pick one of the listed (weighted by ease of use on your target platform).

And from trshiv's link, it looks like Hoard, tcmalloc, and ptmalloc are all roughly comparable for speed. Overall, tt looks like ptmalloc is optimized for taking as little room as possible, Hoard is optimized for a trade-off of speed + memory usage, and tcmalloc is optimized for pure speed.

What is the performance of C/C++ allocator in multithread context?

Google perf tools provide an allocator named TCMalloc. This allocator use a pool of memory for each thread (= "thread caching system"). Documentation shows performance improvements measurements over glibc 2.3.

Glibc use a pool of memory for each thread since 2.16.

Therefore, there are no more performance differences now:

Fedora [we] used to use tcmalloc for QEMU for a while. Then we checked performance
again and found that the delta to glibc's native malloc had essentially gone

Also notice that C++ new operator call malloc function provided by libc (= glibc malloc in most cases).

So:

No, this behavior is not standardized
It use a pool per thread only (and only if) you use glibc >= 2.16, else you can try to compile with TCMalloc.

C++: Allocating memory within multiple threads

Allocation memory (via malloc/new/HeapAlloc and such) are thread-safe by default as long as you've compiled you application against thread safe runtimes (which you will, unless you explicitly change that).

Each vector will get its own slice of memory whenever they resize, but once a slice is freed any (other) thread could end up getting it the next time an allocation occurs.

You could, however, mess things up if you replace your allocators. Like if you overload the "new operator" and you're no longer getting memory for a thread safe source. https://en.cppreference.com/w/cpp/memory/new/operator_new

You could also opt to use a non-thread-safe version of malloc if you replace if via some sort of library-preload on Linux (Overriding 'malloc' using the LD_PRELOAD mechanism).

But assuming you're not doing any of that, the default implementation of user-space allocators (new/malloc) are thread safe, and OS level allocators (VirtualAlloc/mmap) are always thread safe.

Can multithreading speed up memory allocation?

Dynamic allocation of memory uses the heap of the application/module/process (but not thread). The heap can only handle one allocation request at a time. If you try to allocate memory in "parallel" threads, they will be handled in due order by the heap. You will not get a behaviour like: one thread is waiting to get its memory while another can ask for some, while a third one is getting some. The threads will have to line-up in queue to get their chunk of memory.

What you would need is a pool of heaps. Use whichever heap is not busy at the moment to allocate the memory. But then, you have to watch out throughout the life of this variable such that it does not get de-allocated on another heap (that would cause a crash).

I know that Win32 API has functions such as GetProcessHeap(), CreateHeap(), HeapAlloc() and HeapFree(), that allow you to create a new heap and allocate/deallocate memory from a specific heap HANDLE. I don't know of an equivalence in other operating systems (I have looked for them, but to no avail).

You should, of course, try to avoid doing frequent dynamic allocations. But if you can't, you might consider (for portability) to create your own "heap" class (doesn't have to be a heap per se, just a very efficient allocator) that can manage a large chunk of memory and surely a smart pointer class that would hold a reference to the heap from which it came. This would enable you to use multiple heaps (make sure they are thread-safe).

Multithreaded (de)allocation on the heap

(updating comment to answer)

It could be because with one thread all the allocations are sequential, so the frees are as well. With the multithreaded allocations, they are more intermixed so free needs to do more work to clean up after each deallocation.

In multithreaded C/C++, does malloc/new lock the heap when allocating memory

There could be improvements in certain implementations, such as creating a thread-specific cache (in this case allocations of small blocks will be lock-free). For instance, this from Google. But in general, yes, there is a lock on memory allocations.

Implementing a memory manager in multithreaded C/C++ with dynamically sized memory pool?

Prepare more than one solution and let the user of the framework adopt any particular one. Policy classes to the generic allocator you develop would do this nicely.
A nice way to get around this is to wrap up pointers in a class with overloaded * operator. Make the internal data of that class only an index to the memory pool. Now, you can just change the index quickly after a background thread copies the data over.
Most good C++ libraries support allocators and you should implement one. You can also overload the global new so your version gets used. And keep in mind that you generally won't need to think about a library allocating or deallocating a large amount of data, which is generally a responsibility of client code.

Multithreaded Memory Allocators for C/C++