Heapcreate, Heapalloc in Linux, Private Allocator for Linux

What are the Windows and Linux native OS/system calls made from malloc()?

In Windows, in recent versions of MSVC, malloc (and C++ new, as it is implemented using the same fundamentals for the actual memory allocation part of new) calls HeapAlloc(). In other versions, such as g++ mingw, the C runtime is an older version, which doesn't call quite as directly to HeapAlloc, but at the base of it, it still goes to HeapAlloc - to find something different, we need to go back to Windows pre-95, which did have a GlobalAlloc and LocalAlloc set of functions - but I don't think people use 16-bit compilers these days - at least not for Windows programming.

In Linux, if you are using glibc, it depends on the size of the allocation whether it calls sbrk or mmap - mmap (with MAP_ANONYMOUS in the flags) is used for larger allocations (over a threshold, which I believe is 2MB in the typical implementation)

In linux (or POSIX) function similar to win32 mem api

Read Advanced Linux Programming. Don't seek an exact equivalent in Linux for each functionality of Win32 that you know or want. Learn to natively think in Linux terms. Study free software similar to yours (see freecode or sourceforge to find some).

^{And yes, Posix or Linux vs Windows is very different, notably for their notion of processes, etc...}

You probably want mmap(2) and mprotect(2); I don't know at all Windows (so I have no idea of what HeapCreate does).

Maybe using the lower layer of cross-platform toolkits like Qt (i.e. QtCore...) or Glib (from Gtk ...) might help you.

Linux C standard library is often GNU libc (but you could use some other, e.g. MUSL libc, which is very readable IMHO). It use syscalls listed in syscalls(2) and implemented by the Linux kernel (in particular, malloc(3) is generally built above mmap(2)...).

Take the habit of studying the source code of free software if that helps you.

BTW, for an interpreter, you could consider using Boehm's conservative garbage collector...

How are malloc and free implemented?

On linux, malloc and free are not system calls. malloc/free obtains memory from the kernel by extending and shrinking(if it can) the data segment using the brk system calls as well as obtaining anonymous memory with mmap - and malloc manages memory within those regions. Some basic information any many great references can be found here

malloc() vs. HeapAlloc()

You are right that they both allocate memory from a heap. But there are differences:

malloc() is portable, part of the standard.
HeapAlloc() is not portable, it's a Windows API function.

It's quite possible that, on Windows, malloc would be implemented on top of HeapAlloc. I would expect malloc to be faster than HeapAlloc.

HeapAlloc has more flexibility than malloc. In particular it allows you to specify which heap you wish to allocate from. This caters for multiple heaps per process.

For almost all coding scenarios you would use malloc rather than HeapAlloc. Although since you tagged your question C++, I would expect you to be using new!

What's the differences between VirtualAlloc and HeapAlloc?

Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.

VirtualAlloc

A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.

HeapAlloc

Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.

malloc

The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.

new

The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.

In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).

There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.

Integrating C++ custom memory allocators across shared/static libraries

How memory allocation is done heavily depends on the operating system. You need to understand how shared libraries work in those operating systems, how C language relates to those operating systems and to the concept of shared libraries.

C, C++ and modular programming

First of all, I want to mention that C language is not a modular language e.g. it has no support for modules or modular programming. For languages like C and C++ implementation of modular programming is left to the underlying operating system. Shared libraries is an example of mechanism that is used to implement modular programming with C and C++, therefore I will refer to them as modules.

Module = shared library and executable

Linux and Unix-like systems

Initially everything on Unix systems was statically linked. Shared libraries came later. And as Unix was a starting point for the C langauge, those systems try to provide shared library programming interface that is close to what programming in C feels like.

The idea is that C code written originally without shared libraries in mind should be build and should work without changes made to the source code. As the result, provided environment usually has single process-wide symbol namespace shared by all loaded modules e.g. there can only be a single function with a name foo in the whole process, except for static functions (and some functions that are hidden in moduels using OS-specific mechanisms). Basically it is the same as with static linking where you are not allowed to have duplicate symbols.

What this means for your case is that there is always a single function named malloc in use in the whole process and every module is using it e.g. all modules share the same memory allocator.

Now if process happens to have multiple malloc functions, only a single one is picked and will be used by all modules. Mechanism here is very simple - as shared libraries do not know location of every referenced function, they will usually call them though some table (GOT, PLT) that will be filled with required addresses lazily on the first call or at load time. The same rule is applied to the module that provides original function - even internally this function will be called though the same table, making it possible to override that function even in the original module that provides it (which is the source of many ineffeciencies related to usage of shared libraries on Linux, search for -fno-semantic-interposition, -fno-plt to overcome this).

The general rule here is that the first module to introduce symbol will be the one providing it. Therefore original process executable has the highest priority here and if it defines malloc function, that malloc function will be used everywhere in the process. The same applies to functions calloc, realloc, free and others. Using this trick and tricks like LD_PRELOAD allow you to override "default memory allocator" of your application. This is not guaranteed to work thought as there are some corner cases. You should consult documentation for your library before doing this.

I want to specifically note that this means there is a single heap in the process shared by all modules and there is a good reason for that. Unix-like systems usually provide two ways of allocating memory in a process:

brk, sbrk syscalls
mmap syscall

The first one provides you an access to a single per-process memory region usually allocated directly after the executable image. Because of the fact that there is only one such region, this way of memory allocation can only be used by a single allocator in a process (and it is usually already used by your C library).

This is important to understand before you throw any custom memory allocator into your process - it either should not use brk, sbrk, or should override existing allocator of your C library.

The second one can be used to request chunk of memory directly from the underlying kernel. As kernel know the structure of your process virtual memory, it is able to allocate pages of memory without interfering with any user-space allocator. This is also the only way to have multiple fully independent memory allocators (heaps) in the process.

Windows

Windows does not rely on C runtime the same way Unix-like systems do. Instead it provides its own runtime - Windows API.

There are two ways of allocating memory with Windows API:

Using functions like VirtualAlloc, MapViewOfFile.
And heap allocation functions - HeapCreate, HeapAlloc.

The first one is an equivalent to mmap, while the second one is a more advanced version of malloc which is based internally (as I believe) on VirtualAlloc.

Now because Windows does not have the same relation to C language as Unix-likes have, it does not provide you with malloc and free functions. Instead, those are provided by C runtime library which is implemented on top of Windows API.

Another thing about Windows - it does not have a concept of single per process symbol namespace e.g. you cannot override function here the same way you do on Unix-like systems. This allows you to have multiple C runtimes co-existing in the same process, and every of those runtimes can provide its independent implementation of malloc, free etc, each operating on a separate heap.

Therefore on Windows all libraries will share a single process Windows API-specific heap (can be obtained through GetProcessHeap), at the same time they will share heap of one of C runtimes in the process.

So how do you integrate memory allocator into your program?

It depends. You need understand what you are trying to achieve.

Do you need to replace memory allocator used by everyone in your process e.g. the default allocator? This is only possible on Unix-like system.

The only portable solition here is to use your specific allocator interface explicitly. It doesn't really matter how you do this, you just need to make sure the same heap is shared by all libraries on Windows.

The general rule here is that either everything should be statically linked or everything should be dynamically linked. Having some sort of mix between the two might be really complicated and requires you to keep the whole architecture in your head to avoid mixing heaps or other data structures in your program (which is not a big problem if you don't have many modules). If you need to mix static and dynamic linking, you should build you allocator library as a shared library to make it easier having single implementation of it in a process.

Another difference between Unix-alikes and Windows is that Windows does not have a concept of "statically linked executable". On Windows every executable has dependencies on Windows-specific dynamic libraries like ntdll.dll. While with ELF executables have separate types for "statically linked" and "dynamically linked" executables.

This is mostly due to single per-process symbol namespace which makes it dangerous to mix shared and static linking on Unix-alikes, but allows Windows to mix static and dynamic linking just fine (almost, not really).

If you use one of your libraries, you should make sure you link it dynamically with dynamically linked executables. Imagine if you link your allocator statically into your shared library, but another library in your process uses the same library too - you might be using another allocator by accident, not the one you were expecting.