What Is the Aligment Requirements for Sys_Brk

Is the initial call to sbrk(0) in Linux always return a value aligned to 8 bytes ( or 4 in case of 32-bit systems)

The standard for brk and sbrk explicitly does not specify whether the returned address is aligned in any way. On Mac OS X (and maybe other BSD systems) the sizes/addresses are page-aligned, but on Linux no such rounding takes place as can easily be tested with this little program:

#include <unistd.h>
#include <stdio.h>

int main() {
void *p;
p = sbrk(0);
printf("Initial brk: %p\n", p);
p = sbrk(1); // Increase the brk (returns OLD brk!)
p = sbrk(0); // Get the new brk
printf("New brk: %p\n", p);

return 0;
}

On one of my systems, the output was:

Initial brk: 0x602000
New brk: 0x602001

But you asked for the initial call. The Linux man-page states:

brk() and sbrk() change the location of the program break, which defines the end of the process's data segment (i.e., the program break is the first location after the end of the uninitialized data segment). Increasing the program break has the effect of allocating memory to the process; decreasing the break deallocates memory.

The unitialized data segment is also known as BSS. The keyword here is segment, so it's very likely the initial value is always page-aligned.

If you want to be on the safe side and check, you can verify the initial address by taking the modulo with the page size (which you can query via getpagesize).


Update: So I was curious and dug around a bit more. In the man-page, I already read that brk and sbrk are implemented atop the kernel's sys_brk. Its implementation in the kernel source can be found in mm/mmap.c (or mm/nommu.c for systems without a Memory Management Unit; we'll ignore this one). In the brk implementation in mm/mmap.c, we find this line:

newbrk = PAGE_ALIGN(brk);

("brk" here is the argument, not the function.) So the kernel does do page aligning… sort of: while the calculations are done with the page-aligned values and any necessary memory allocation is page-aligned, the value stored for the brk is actually the pointer value you passed:

mm->brk = brk;

So in the user-space it doesn't look like any page-algning took place even though the kernel did. I looked at versions 3.17.5 and 2.4.37, the behaviour is the same.

Regarding the initial value, in fs/binfmt_elf.c (which implements ELF linking) we find a function set_brk which sets the initial "brk" value (mm->start_brk). This value is explicitly page-aligned. The same is true for fs/binfmt_aout.c which handles the old a.out format and fs/binfmt_som.c which handles HP-UX SOM format (never heard of it before). There's also fs/binfmt_flat.c which sets the initial brk value but doesn't align explicitly; the value is implicitly aligned here. So it looks like the initial value is always page-aligned. At least it's guaranteed to be page-aligned for ELF files, which is what we care about for "normal" systems.

The glibc simply wraps sys_brk and adds bookkeeping to correctly implement sbrk. So glibc's brk behaviour is that of the kernel, the return value sys_brk is stored in an internal hidden variable __curbrk so that sbrk can calculate the new address correctly.

does brk and sbrk round the program break to the nearest page boundary?

brk allocates/deallocates pages. That implementation detail based on the fact that the smallest unit of data for memory management in a virtual memory operating system is a page is transparent to the caller, however.

In the Linux kernel, brk saves the unaligned value and uses the aligned value to determine if pages need to be allocated/deallocated:

asmlinkage unsigned long sys_brk(unsigned long brk)
{
[...]
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk == newbrk)
goto set_brk;
[...]
if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
goto out;
set_brk:
mm->brk = brk;
[...]
}

As for sbrk: glibc calls brk and maintains the (unaligned) value of the current program break (__curbrk) in userspace:

void *__curbrk;
[...]
void *
__sbrk (intptr_t increment)
{
void *oldbrk;
if (__curbrk == NULL || __libc_multiple_libcs)
if (__brk (0) < 0) /* Initialize the break. */
return (void *) -1;
if (increment == 0)
return __curbrk;
oldbrk = __curbrk;
[...]
if (__brk (oldbrk + increment) < 0)
return (void *) -1;
return oldbrk;
}

Consequently, the return value of sbrk does not reflect the page alignment that happens in the Linux kernel.

Concept of allocating free memory

Memory can be allocated in a number of ways...the two broad ways being static and dynamically allocated.

Static allocation means that all the memory that can be used by a program is allocated at once, and it is able to use up to that amount.

Dynamic allocation means that when the program needs to allocate memory, it goes to the heap and puts a pointer at the first available chunk of memory (specified in size according to the dynamic allocation algorith in use). Then, as it needs (such as in an array) it takes more, maintaining the pointer at that original spot so it knows where the beginning of the array is. Modern computers usually do a good job of allocating resources including memory to applications, which reduces the chance of deadlock. On a higher level, garbage collection takes care of this memory when the object/array/whatever can be removed from memory.

The problem here is that when free memory is given and freed at will, different programs can grab different chunks which aren't necessarily in order. This is what we call fragmentation (which is why you defrag your disk drive every now and then). When memory is allocated in a contiguous fashion, it can be read more efficiently.

There's a huge amount of information on memory, so here is a minute amount of data for you to allocate in your own memory ;)

Wiki Link to OSDev on Dynamic Allocation

Dynamic allocation in C++

Memory in C (This is kind of low level yet easy to understand)

Happy reading!

x86 memory access segmentation fault

On Linux(x86) - although you have a virtual address range of 4gb in your process, not all of it is accessible. The upper 1gb is where the kernel resides, and there are areas of low memory that can't be used. Virtual memory address 0xfff can't be written to or read from (by default) so your program crashes with a segfault.

In a followup comment you suggested you were intending to create a heap in assembler. That can be done, and one method is to use the sys_brk system call. It is accessed via int 0x80 and EAX=45 . It takes a pointer in EBX representing the new top of the heap. Generally the bottom of the heap area is initialized to the area just beyond your programs data segment(above your program in memory). To get the address of the initial heap location you can call sys_break with EBX set to 0. After the system call EAX will be the current base pointer of the heap. You can save that away when you need to access your heap memory or allocate more heap space.

This code provides an example for purposes of clarity (not performance), but might be a starting point to understanding how you can manipulate the heap area:

SECTION .data
heap_base: dd 0 ; Memory address for base of our heap

SECTION .text
global _start
_start:
; Use `brk` syscall to get current memory address
; For the bottom of our heap This can be achieved
; by calling brk with an address (EBX) of 0
mov eax, 45 ; brk system call
xor ebx, ebx ; don't request additional space, we just want to
; get the memory address for the base of our processes heap area.
int 0x80
mov [heap_base], eax ; Save the heap base

;Now allocate some space (8192 bytes)
mov eax, 45 ; brk system call
mov ebx, [heap_base] ; ebx = address for base of heap
add ebx, 0x2000 ; increase heap by 8192 bytes
int 0x80

; Example usage
mov eax, [heap_base] ; Get pointer to the heap's base
mov dword [eax+0xFFF], 25 ; mov value 25 to DWORD at heapbase+0xFFF

;Exit the program
mov eax, 1
xor ebx, ebx
int 0x80


Related Topics



Leave a reply



Submit