Graphics Card Memory and Virtual Address Space of a Process

Graphics card memory and virtual address space of a process

How does my game application virtual address space look like?

Impossible to tell. OpenGL leaves this detail completely open to the vendor implementation. Anything that satisfies the specification is allowed.

Is graphics card memory mapped in this virtual address space?

Maybe, maybe not. That depends on the actual implementation.

Also, is there some relation between RAM and graphics card memory?

Usually yes. As far and the majority of OpenGL implementation are concerned the graphics card's RAM is essentially a cache for things that actually live in system memory (CPU RAM + swap space + stuff memory mapped from storage). However this is not pinned down to the specification and anything that satisfies the OpenGL specification is allowed.

Does Linux allocate equal RAM for graphics card which can not be used by any process?

No, because Linux (the kernel) is not concerned with these things. Your graphics card's driver is, though. And the driver may do it any way it sees fit. It can either map OpenGL context data into a separate address space through Physical Address Extension (PAE) or place it in a different process or keep it in your game's address space, or…, or…, or…. There's no written down scheme on this.

That said, it results then into only 3GB of RAM available to my game process?

If so, then more like (3GB - 1GB) - x where 0 < x because the top 1GB of your process' address space are reserved for the kernel and of course your program's text (the binary executed by the CPU) and the text of the libraries it's using takes some address space as well.

What is the maximum addressable space of virtual memory?

Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.

There are basically three different "address spaces".

The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.

The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.

The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.

Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.

Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.

Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.

Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.

Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

How does Windows give 4GB address space each to multiple processes when the total memory it can access is also limited to 4GB

The basic idea is that you have limited physical RAM. Once it fills up, you start storing stuff on the hard disk instead. When a process requests data that is currently on disk, or asks for new memory, you kick out a page from RAM by transferring it to the disk, and then page in the data you actually need.

The OS maintains a data structure called a page table to keep track of which logical addresses correspond to the data currently in physical memory and where stuff is on the disk.

Each process has its own virtual address space, and operates using logical addresses within this space. The OS is responsible for translating requests for a given process and logical address into a physical address/location on disk. It is also responsible for preventing processes from accessing memory that belongs to other processes.

When a process asks for data that is not currently in physical memory, a page fault is triggered. When this occurs, the OS selects a page to move to disk (if physical memory is full). There are several page replacement algorithms for selecting the page to kick out.

Address of Video memory

What happens when you write to the address:

That area of the address space is not mapped to RAM, instead it gets sent across the system bus to your VGA card. The BIOS set this up with your VGA card at boot time (Lots of address ranges are memory mapped to various devices). No code is being executed on the CPU to plot the pixels when you write to this area of the address space. The VGA card receives this information instead of your RAM and does this itself.

If you wanted you could look BIOS functions calls and have it reconfigure the hardware so you could plot pixels instead of place characters at the video address instead. You could even probe it to see if it supports VESA and switch to a nice 1280*768 32bpp resolution. The BIOS would then map an area of the address space of your choosing to the VGA card for you.

More about the BIOS:

The BIOS is a program that comes with your motherboard that your CPU executes when it first powers up. It sets up all the hardware, maps all the memory mapped devices, creates various useful tables, assigns IO ports, hooks interrupts up to a bunch of routines it leaves in memory. It then loads your bootsector from a device and jumps to your OS code.

The left behind routines and data structures enable you to get your OS off the ground. You can load sectors off a disk, write text to the screen, get information about the system (Memory maps, ACPI tables, MP Tables .etc). Without these routines and data structures, it would be a lot harder if not impossible to make an acceptable bootsector and have all the information about the system to build a functional kernel.

However the routines are dated, slow and have very restrictive limitations. For one the routines left in memory are 16bit real mode code, so as soon as you switch to 32bit protected mode you have to switch back constantly or use VM86 mode to access them (Completely unaccessible in 64bit mode, Apparently emulating the instructions with a modified linux x86emu library is an option though). So the routines are generally very slow. So you will need to write your own drivers from scratch if you move away from real mode programming.

Are memory mapped I/O address and RAM address related?

You tagged arm and other processors are not necessarily different.

Normally some percentage of the address space is carved out for memory mapped I/O, gpio, uart, nvic, etc. With arm you will have some internal address space that doesnt make it to the axi/amba bus(ses) for the chip vendor.

So if you wanted to use an arm with a 32 bit address bus then hooking up 4GB flat is a waste of time. You can certainly hook up more (I have an arm11 with 24GB)) but it is not linear you have to have an address scheme like PCIe hsa where you point a window of the address space you can get to into an address space beyond that (again think PCIe but reality not the illusion that they try to present in x86).

But you are overcomplicating this. Particularly with an ARM where it is all documented. You have a core you are a chip vendor you buy this core it has an address bus (see the amba/axi documentation) you hook up to that address bus if it is a cortex-m they have some guidelines as to where to put ram and rom and keep out of the way here. For the full sized arms its mostly fair game, you provide the base address to the core where certain peripherals are mapped (think nvic, timers, etc. Instead of like the cortex-m where the systick timer base address is hardcoded in the design, you feed the core the base address in your address space where the internal items life PERIPHBASE or some such signal/bus name). Beyond that it is the whim of the chip vendor as to how to divide up that address space, the arm can boot typically in one of two addresses but of course you can have as many address layers as you want, and for each layer have a conversion/translation to that address space. This includes peripherals, memory (ram/rom/flash) usb, pcie, etc address spaces, etc.

So it could be like a pc where the pcie window takes away the one or two gig of ram at that same space and you simply lose that memory, but in that case you are thinking about it a bit wrong because those are different address spaces/layers. Some pcs once 64 bit dominated over 32 bit and even though 32 bit isnt completely dead but we can now have bioses that default to 64 bit and allowing the pcie window to be above the memory instead of cutting a hole in it.

The nice thing about buying a core like arm or mips that you can to some extent if not completely design the address space however you like, dont have to conform to anything, etc.

There is no one answer to your question you need to specify the specific chip and board (and version of that system) to have this conversation, and if it were a real, available product, they wouldnt have bothered unless there was a windowed address scheme. folks like to think the segment offset thing was bad but it still exists in most usable systems we just cant use those terms any more, and we dont always have segment registers but we still have the address space carved out and windowed. MMUs make this much easier to segment the address spaces but make them look linear.

What is meant by memory-mapped video?

For example, on a x8086 Intel CPU, it has a 20 bit address line.

Some background first

(Unlike most other CPUs) x86 CPUs have two address ranges:

The actual memory address range (which is accessed by "mov al, [ds:di]" for example)
This memory range is intended for memory.
The I/O address range which is accessed using the in and out instructions
This "memory" range is intended for I/O.

The 8086 actually has 21 address lines: A19-A0 and "M/nIO". The A19-A0 lines contain the actual address and the "M/nIO" line contains the information if the "regular" memory or the I/O range is accessed.

In an x86 PC (*) address 0x00021 in the memory area (M/nIO = 1) is RAM while address 0x0021 in the I/O range (M/nIO = 0) is the interrupt controller.

What does memory-mapped video means?

"Memory-mapped I/O" means that some device is addressed using the "memory" address space and not using the "I/O" address space:

The video adapter is addressed by reading and writing to the memory addresses 0xA0000-0xBFFFF (depending on the video mode), not by using in and out instructions.

(For CPUs not having the concept of two address ranges at all - like ARM - it is sometimes also said that the CPU uses memory-mapped I/O. This means that all I/O devices are addressed like memory and there are no special instructions for I/O addressing.)

In the case of video the word might have a more special meaning:

There are systems where you have to access the video memory using multiple I/O accesses:

If you wanted to write data to the video memory using the TMS9918 video chip (which was popular in the 1980s) you first had to write a value specifying the address in the video RAM and then you had to write the actual data. The CPU always had to write both values (video RAM address and data) to the same addresses - independent of the address in video RAM which should be written.

If someone says that the video is not memory-mapped I would understand that the system has this behavior.

On x86 PCs (*) this is not the case: A certain address in the video RAM corresponds to a certain CPU address. So if you want to write to 10 different addresses in video RAM the CPU simply has to write data to 10 different addresses.

(*) I don't write x86 "systems" because x86 CPUs are also used in mobile and embedded devices. In such systems the memory layout (which depends on the circuit outside the CPU) may differ.

glMapBuffer() and glBuffers, how does the access with a (void*) work with hardware?

This is really a hardware question, actually...

No it's not. You'll see why in a moment.

I come to a section on OpenGL buffers, and as far as I understand they are memory spaces allocated in graphics card memory, is this correct?

Not quite. You must understand that while OpenGL gets you really close to the actual hardware, you're still very far from touching it directly. What glMapBuffer does is, that it sets up a virtual address range mapping. On modern computer systems the software doesn't operate on physical addresses. Instead a virtual address space (of some size) is used. This virtual address space looks like one large contiguous block of memory to the software, while in fact its backed by a patchwork of physical pages. Those pages can be implemented anyhow, they can be actual physical memory, they can be I/O memory, they can even be created in-situ by another program. The mechanism for that is provided by the CPU's Memory Management Unit in collaboration with the OS.

So for each process the OS manages a table of which part of the process virtual address space maps to what page handler. If you're running Linux have a look at /proc/$PID/maps. If you have a program that uses glMapBuffer read (with your program, don't call system) /proc/self/maps before and after the map buffer and look for the differences.

As far as I was aware, all possible memory addresses (eg on a 64bit system there are uint64_t num = 0x0; num = ~num; possible addresses) were used for system memory as in RAM / CPU side Memory.

What makes you think that? Whoever told you that (if somebody told you that) should be slapped in the face… hard.

What you have is a virtual address space. And this address space is completely different from the physical address space on the hardware side. In fact the size of the virtual address space and the size of physical address spaces can differ largely. For example for a long time there were 32 bit CPUs and 32 bit operating systems around. But already then it was desireable to have more than 4 GiB of system memory. So while the CPU would support only 32 bits of address space for a process (maximum size of a pointer), it may have provided 36 bits of physical address lines to memory, to support some 64 GiB of system RAM; it would then be the OS's job to manually switch those extra bits, so that while each process sees only some 3 GiB of system RAM (max.) processes in total could spread. A technique like that has become known as Physical Address Extension (PAE).

Furthermore not all of the address space in a process are backed by RAM. Like I already explained, address space mappings could be backed by anything. Often the memory pagefault handler will also implement swapping, i.e. if there's not enough free RAM around it will use HDD storage (in fact on Linux all userspace requests for memory are backed by the Disk I/O Cache handler). Also since the address space mappings are per process, some part of the address space is mapped kernel memory, which is the (physically) same for all processes and also resides at the same place in all processes. From user space this address space mapping is not accessible, but as soon as a syscall makes a transistion into kernel space it gets accessible; yes the OS kernel uses virtual memory internally, too. It just can't choose as broadly from the available backings (for example it would be very difficult for a network driver to operate, if its memory was backed by the network itself).

Anyway: On modern 64 bit systems you got a 64 bit pointer size, and with current hardware there are 48 bits of physical RAM address lines. Which leaves plenty of space, namely 16 × 48 bits (EDIT which means 2^16 - 1 times a 48 bit address space), for virtual mappings where there's no RAM around. And because there's so much to go around, each and every PCI card gets its very own address space, that behaves a little bit like RAM to the CPU (remember those PAE I mentioned earlier, well, in good old 32 bit times something like that had to be done to talk with extension cards already).

Now here comes the OpenGL driver. It simply provides a new address mapping handler, that usually just builds on top of the PCI address space handler, which will map a portion of virtual address space of a process. And whatever happens in that address space will be reflected by that mapping handler into a buffer ultimately accessed by the GPU. However the GPU itself may be accessing CPU memory directly. And what AMD plans is, that GPU and CPU will live on the same Die and access the same memory, so there's no longer a physical distinction there.

Graphics Card Memory and Virtual Address Space of a Process