How are sbrk/brk implemented in Linux?
In a very high level view, the Linux kernel tracks the memory visible to a process as several "memory areas" (struct vm_area_struct
). There is also a structure which represents (again in a very high level view) a process' whole address space (struct mm_struct
). Each process (except some kernel threads) has exactly one struct mm_struct
, which in turn points to all the struct vm_area_struct
for the memory it can accesss.
The sys_brk
system call (found in mm/mmap.c
) simply adjusts some of these memory areas. (sbrk
is a glibc wrapper around brk
). It does so by comparing the old value of the brk
address (found inside struct mm_struct
) and the requested value.
It would be simpler to look at the mmap
family of functions first, since brk
is a special case of it.
does brk and sbrk round the program break to the nearest page boundary?
brk
allocates/deallocates pages. That implementation detail based on the fact that the smallest unit of data for memory management in a virtual memory operating system is a page is transparent to the caller, however.
In the Linux kernel, brk
saves the unaligned value and uses the aligned value to determine if pages need to be allocated/deallocated:
asmlinkage unsigned long sys_brk(unsigned long brk)
{
[...]
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk == newbrk)
goto set_brk;
[...]
if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
goto out;
set_brk:
mm->brk = brk;
[...]
}
As for sbrk
: glibc calls brk
and maintains the (unaligned) value of the current program break (__curbrk
) in userspace:
void *__curbrk;
[...]
void *
__sbrk (intptr_t increment)
{
void *oldbrk;
if (__curbrk == NULL || __libc_multiple_libcs)
if (__brk (0) < 0) /* Initialize the break. */
return (void *) -1;
if (increment == 0)
return __curbrk;
oldbrk = __curbrk;
[...]
if (__brk (oldbrk + increment) < 0)
return (void *) -1;
return oldbrk;
}
Consequently, the return value of sbrk
does not reflect the page alignment that happens in the Linux kernel.
What does the brk() system call do?
In the diagram you posted, the "break"—the address manipulated by brk
and sbrk
—is the dotted line at the top of the heap.
The documentation you've read describes this as the end of the "data segment" because in traditional (pre-shared-libraries, pre-mmap
) Unix the data segment was continuous with the heap; before program start, the kernel would load the "text" and "data" blocks into RAM starting at address zero (actually a little above address zero, so that the NULL pointer genuinely didn't point to anything) and set the break address to the end of the data segment. The first call to malloc
would then use sbrk
to move the break up and create the heap in between the top of the data segment and the new, higher break address, as shown in the diagram, and subsequent use of malloc
would use it to make the heap bigger as necessary.
Meantime, the stack starts at the top of memory and grows down. The stack doesn't need explicit system calls to make it bigger; either it starts off with as much RAM allocated to it as it can ever have (this was the traditional approach) or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach). Either way, there may or may not be a "guard" region at the bottom of the address space that can be used for stack. If this region exists (all modern systems do this) it is permanently unmapped; if either the stack or the heap tries to grow into it, you get a segmentation fault. Traditionally, though, the kernel made no attempt to enforce a boundary; the stack could grow into the heap, or the heap could grow into the stack, and either way they would scribble over each other's data and the program would crash. If you were very lucky it would crash immediately.
I'm not sure where the number 512GB in this diagram comes from. It implies a 64-bit virtual address space, which is inconsistent with the very simple memory map you have there. A real 64-bit address space looks more like this:
Legend: t: text, d: data, b: BSS
This is not remotely to scale, and it shouldn't be interpreted as exactly how any given OS does stuff (after I drew it I discovered that Linux actually puts the executable much closer to address zero than I thought it did, and the shared libraries at surprisingly high addresses). The black regions of this diagram are unmapped -- any access causes an immediate segfault -- and they are gigantic relative to the gray areas. The light-gray regions are the program and its shared libraries (there can be dozens of shared libraries); each has an independent text and data segment (and "bss" segment, which also contains global data but is initialized to all-bits-zero rather than taking up space in the executable or library on disk). The heap is no longer necessarily continous with the executable's data segment -- I drew it that way, but it looks like Linux, at least, doesn't do that. The stack is no longer pegged to the top of the virtual address space, and the distance between the heap and the stack is so enormous that you don't have to worry about crossing it.
The break is still the upper limit of the heap. However, what I didn't show is that there could be dozens of independent allocations of memory off there in the black somewhere, made with mmap
instead of brk
. (The OS will try to keep these far away from the brk
area so they don't collide.)
How malloc() and sbrk() works in unix?
why malloc reduces the number of sbrk() system calls that the program
must perform?
say, if you call malloc() to request 10 bytes memory, the implementation may use sbrk (or other system call like mmap) to request 4K bytes from OS. Then when you call malloc() next time to request another 10 bytes, it doesn't have to issue system call; it may just return some memory allocated by system call of the last time 4K.
What's unsafe/legacy about brk/sbrk?
The reality highly depends on the implementation but here are some elements:
unsafe
brk
/sbrk
were invented to allow a process to request more memory from the system, and release it in a single contiguous segment. As such, they were used by many malloc
and free
implementations. The problem was that, as it returned a unique segment, things would go wrong as multiple modules (of the same process) use it directly. It became even worse in a multi-threaded process because of race conditions. Suppose 2 threads want to add new memory. They will look at the current top address with sbrk(0)
, see the same address, request new memory with either brk
or sbrk
, and because of the race condition, will both use the same memory.
Even in a single threaded process, some malloc
and free
implementations assume that they only are allowed to use the low level s/brk
interface, and that any other code should use them. In that case things will go wrong if the image of the break segment that they internally maintain is no longer the assumed value. They should have to guess that some parts of the segment are "reserved" for other uses, possibly breaking the ability to release any memory.
For that reason, user code should never directly use brk
/sbrk
and only rely on malloc
/free
. If, and only if, you are writing an implementation of the standard library including malloc
/realloc
/calloc
/free
, you can safely use brk
/sbrk
legacy
On modern system, mmap
can make a much nicer usage of virtual memory management. You can use as many dynamic memory segments as you need with no interaction between them. So, on a modern system, unless you have a specific need for memory allocation using brk
/sbrk
, you should use mmap
.
portability
The FreeBSD reference for brk
and sbrk
states this:
The brk() and sbrk() functions are legacy interfaces from before the
advent of modern virtual memory management.
and later:
BUGS:
Mixing brk() or sbrk() with malloc(3), free(3), or similar functions will
result in non-portable program behavior.
Why does malloc() call mmap() and brk() interchangeably?
so why malloc calls mmap when it comes to allocate a large size of memory?
The short answer is for improved efficiency on newer implementations of Linux, and the updated memory allocation algorithms that come with them. But keep in mind that this is a very implementation dependent topic, and the whys and wherefores would vary greatly for differing vintages and flavors of the specific Linux OS being discussed.
Here is fairly recent write-up regarding the low-level parts mmap()
and brk()
play in Linux memory allocation. And, a not so recent, but still relevant Linux Journal article that includes some content that is very on-point for the topic here, including this:
For very large requests, malloc() uses the mmap() system call to find
addressable memory space. This process helps reduce the negative
effects of memory fragmentation when large blocks of memory are freed
but locked by smaller, more recently allocated blocks lying between
them and the end of the allocated space. In this case, in fact, had
the block been allocated with brk(), it would have remained unusable
by the system even if the process freed it.
(emphasis mine)
Regarding brk()
:
incidentally, "...mmap() didn't exist in the early versions of Unix. brk()
was the only way to increase the size of the data segment of the process at that time. The first version of Unix with mmap() was SunOS in the mid 80's, the first open-source version was BSD-Reno in 1990.". Since that time, modern implementation of memory allocation algorithms have been refactored with many improvements, greatly reducing the need for them to include using brk()
.
Kernel Source -- In which file is brk() defined
It's in mmap.c
. Look for:
SYSCALL_DEFINE1(brk, unsigned long, brk)
The manual page says:
On Linux, sbrk() is implemented as a library function that uses the
brk() system call, and does some internal bookkeeping so that it can
return the old break value.
Related Topics
Convert Utf8 to Utf16 Using Iconv
Sighup for Reloading Configuration
Google-Chrome Failed to Move to New Namespace
Add User to Group But Not Reflected When Run "Id"
How to Find Out Mount/Partition a Directory or File Is On? (Linux Server)
How to Get the Contents of a Webpage in a Shell Variable
How to List (Ls) the 5 Last Modified Files in a Directory
Get Final Url After Curl Is Redirected
Linux Kernel System Call Returns -1 Instead of {-1, -256}
The Irq in Kernel Function Asm_Do_Irq() Is Different from the One I Request in Module
Moving Multiple Files Having Spaces in Name (Linux)
Differencebetween './Example.Sh' and 'Sh Example.Sh'
How to Look Up a Variable by Name with #!/Bin/Sh (Posix Sh)
How to Run Dos2Unix on an Entire Directory