How Is Ram Allocated

From where does the program allocate memory?

It's actually much more complicated than you'd think. The OS thinks of everything in "pages", it splits the RAM into pages, and the hard drive into pages. When your program starts, it checks how much memory your executable takes, chooses some RAM pages for it, and assigns those pages to your program. If there's no "usable" pages in RAM, it takes older some pages in RAM, and saves them to the hard drive somewhere tucked away, and then gives those pages to you.

When you allocate memory in your program, the memory manager of your program will try to find a free spot in the pages the operating system has assigned to it. If there's not enough, it asks the operating system for more pages, and the operating system makes more room and gives your application more pages.

If your program has a page that it hasn't used in a while, (even code sometimes), the operating system may save that page to the hard drive, and when your program tries to use that page again, the operating system pauses your program, reloads the page into RAM, and then resumes your program.

Here's a diagram that makes no sense

C++ addresses           RAM         hard drive
+------------+    +------------+  +------------+  
| 0x00010000 |\ ->| 0x00010000 |  | 0x00010000 | 
+------------+ X  +------------+  +------------+
| 0x00020000 |/ ->| 0x00020000 |  | 0x00020000 |
+------------+    +------------+  +------------+
| 0x00030000 |-->?         /----->| 0x00030000 |
+------------+            /       +------------+
| 0x00040000 |-----------/        | 0x00040000 |
+------------+
|    etc     |

So in this code, your code has stack memory of 0x00010000-0x0002FFFF, and you've allocated some dynamic memory, and that's in 0x0004000. AS FAR AS YOU KNOW! In reality, when you access 0x0002000, the operating system says "oh, I've stored that page of yours in the RAM address 0x00010000" and reads those values for you. You haven't touched the page of 0x00040000 in a while, so the operating system saved it to the harddrive at harddrive location 0x00030000, but will bring it into RAM if you try to use it. The operating system hasn't given you the address 0x00030000 yet, so if you try to use it, the operating system will tell you that address doesn't have any actual pages, and you get a segmentation fault (segfault). What makes this interesting is when you ask for a large contiguous chunk like a vector, the operating system can give you any old pages it finds laying around, it doesn't have to worry if they're contiguous or not. They look contiguous to your program, which is all that matters.

This also allows the operating system to hide memory of one program from another, which keeps them from being able to read or modify other program's memory space. They're safe! Except... there are ways to tell the operating system to share a page between two programs (though they may have different addresses in each program), allowing them to share pages. DLLs do this.

In reality, it's far more complicated than this.

What and where are the stack and heap?

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.

The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).

To answer your questions directly:

To what extent are they controlled by the OS or language runtime?

The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.

What is their scope?

The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.

What determines the size of each of them?

The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).

What makes one faster?

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.

A clear demonstration:
Sample Image

_{Image source: vikashazrati.wordpress.com}

how memory allocation works at extremes?

A 64-bit process can allocate all of the memory. It doesn't even need to be root, unless the system has defined a ulimit setting for non-root users. Try ulimit -v to see if a limit is set.

Under Linux default settings a process can ask for nearly any amount of memory and it will be granted. The memory will be actually assigned as it is used, and it will come from physical RAM or from disk swap as needed.

A memory allocation resize is normally done in the C library by allocating the new, larger size and copying the old data into the new allocation. It is not usually done by expanding the existing allocation. Memory allocations are chosen to not conflict with other allocations such as program stack.

What does Memory allocated at compile time really mean?

Memory allocated at compile-time means the compiler resolves at compile-time where certain things will be allocated inside the process memory map.

For example, consider a global array:

int array[100];

The compiler knows at compile-time the size of the array and the size of an int, so it knows the entire size of the array at compile-time. Also a global variable has static storage duration by default: it is allocated in the static memory area of the process memory space (.data/.bss section). Given that information, the compiler decides during compilation in what address of that static memory area the array will be.

Of course that memory addresses are virtual addresses. The program assumes that it has its own entire memory space (From 0x00000000 to 0xFFFFFFFF for example). That's why the compiler could do assumptions like "Okay, the array will be at address 0x00A33211". At runtime that addresses are translated to real/hardware addresses by the MMU and OS.

Value initialized static storage things are a bit different. For example:

int array[] = { 1 , 2 , 3 , 4 };

In our first example, the compiler only decided where the array will be allocated, storing that information in the executable.

In the case of value-initialized things, the compiler also injects the initial value of the array into the executable, and adds code which tells the program loader that after the array allocation at program start, the array should be filled with these values.

Here are two examples of the assembly generated by the compiler (GCC4.8.1 with x86 target):

C++ code:

int a[4];
int b[] = { 1 , 2 , 3 , 4 };

int main()
{}

Output assembly:

a:
    .zero   16
b:
    .long   1
    .long   2
    .long   3
    .long   4
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $0, %eax
    popq    %rbp
    ret

As you can see, the values are directly injected into the assembly. In the array a, the compiler generates a zero initialization of 16 bytes, because the Standard says that static stored things should be initialized to zero by default:

8.5.9 (Initializers) [Note]:

Every object of static storage duration is zero-initialized at
program startup before any other initial- ization takes place. In some
cases, additional initialization is done later.

I always suggest people to disassembly their code to see what the compiler really does with the C++ code. This applies from storage classes/duration (like this question) to advanced compiler optimizations. You could instruct your compiler to generate the assembly, but there are wonderful tools to do this on the Internet in a friendly manner. My favourite is GCC Explorer.

Memory allocation - How 15 GB can be equal to 2GB?

Try storing data in your big arrays. Memset would do fine.
You are probably looking at actual memory if you don't touch it these could be still only in virtual memroy.

How is memory allocated for variables in Python?

Python Memory Management

Python does a lot of allocations and deallocations. All objects,
including "simple" types like integers and floats, are stored on the
heap. Calling malloc and free for each variable would be very slow.
Hence, the Python interpreter uses a variety of optimized memory
allocation schemes. The most important one is a malloc implementation
called pymalloc, designed specifically to handle large numbers of
small allocations. Any object that is smaller than 256 bytes uses this
allocator, while anything larger uses the system's malloc.

Why doesn't Python release the memory when I delete a large object?

Memory allocation works at several levels in Python. There’s the
system’s own allocator, which is what shows up when you check the
memory use using the Windows Task Manager or ps. Then there’s the C
runtime’s memory allocator (malloc), which gets memory from the system
allocator, and hands it out in smaller chunks to the application.
Finally, there’s Python’s own object allocator, which is used for
objects up to 256 bytes. This allocator grabs large chunks of memory
from the C allocator, and chops them up in smaller pieces using an
algorithm carefully tuned for Python.

and specifically for floats:

floats also use an immortal & unbounded free list.

So no, it would only be allocated if there are no more free float spots in pythons free list, which depends on previous float usage in your program.

Ultimately, python is doing the memory management for you, so even a solid answer to your question won't give you much insight.

Additional discussion can be found at Python: garbage collection fails?

What decides where on the heap memory is allocated?

The memory is managed by the OS. So the answer depends on the OS/Plattform that is used. The C++ specification does not specify how memory on a lower level is allocated/freed, it specifies it in from of the lifetime.

While multi-user desktop/server/phone OS (like Windows, Linux, macOS, Android, …) have similar ways to how memory is managed, it could be completely different on embedded systems.

What is in control of choosing where on the heap memory is stored and how does it choose that?

Its the OS that is responsible for that. How exactly depends - as already said - on the OS. The OS could also be a thin layer in the form of a combination of the runtime library and minimal OS like includeos

Does this mean that the heap is shared between all processes?

Depends on the point of view. The address space is - for multiuser systems - in general not shared between processes. The OS ensures that one process cannot access memory of another process, which is ensured through virtual address spaces. But the OS can distribute the whole RAM among all processes.

For embedded systems, it could even be the case, that each process has a fixed amount a preallocated memory - that is not shared between processes - and with no way to allocated new memory or free memory. And then it is up to the developer to manage that preallocated memory by themselves by providing custom allocators to the objects of the stdlib, and to construct in allocated storage.

I want to learn more about memory fragmentation

There are two ways of fragmentation. The one is given by the memory addresses exposed by the OS to the C++ runtime. And the one on the hardware/OS side (which could be the same for embedded system) . How and in which form the memory might be fragmented organized by the OS can't be determined using the function provided by the stdlib. And how the fragmentation of the address spaces of the process behaves, depends again on the os and the also on the used stdlib.