Linux Memory Segmentation

Linux memory segmentation

Yes, Linux uses paging so all addresses are always virtual. (To access memory at a known physical address, Linux keeps all physical memory 1:1 mapped to a range of kernel virtual address space, so it can simply index into that "array" using the physical address as the offset. Modulo complications for 32-bit kernels on systems with more physical RAM than kernel address space.)

This linear address space constituted of pages, is split into four segments

No, Linux uses a flat memory model. The base and limit for all 4 of those segment descriptors are 0 and -1 (unlimited). i.e. they all fully overlap, covering the entire 32-bit virtual linear address space.

So the red part consists of two segments __KERNEL_CS and __KERNEL_DS

No, this is where you went wrong. x86 segment registers are not used for segmentation; they're x86 legacy baggage that's only used for CPU mode and privilege-level selection on x86-64. Instead of adding new mechanisms for that and dropping segments entirely for long mode, AMD just neutered segmentation in long mode (base fixed at 0 like everyone used in 32-bit mode anyway) and kept using segments only for machine-config purposes that are not particularly interesting unless you're actually writing code that switches to 32-bit mode or whatever.

(Except you can set a non-zero base for FS and/or GS, and Linux does so for thread-local storage. But this has nothing to do with how copy_from_user() is implemented, or anything. It only has to check that pointer value, not with reference to any segment or the CPL / RPL of a segment descriptor.)

In 32-bit legacy mode, it is possible to write a kernel that uses a segmented memory model, but none of the mainstream OSes actually did that. Some people wish that had become a thing, though, e.g. see this answer lamenting x86-64 making a Multics-style OS impossible. But this is not how Linux works.

Linux is a https://wiki.osdev.org/Higher_Half_Kernel, where kernel pointers have one range of values (the red part) and user-space addresses are in the green part. The kernel can simple dereference user-space addresses if the right user-space page-tables are mapped, it doesn't need to translate them or do anything with segments; this is what it means to have a flat memory model. (The kernel can use "user" page-table entries, but not vice versa). For x86-64 specifically, see https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt for the actual memory map.

The only reason those 4 GDT entries all need to be separate is for privilege-level reasons, and that the data vs. code segments descriptors have different formats. (A GDT entry contains more than just the base/limit; those are the parts that need to be different. See https://wiki.osdev.org/Global_Descriptor_Table)

And especially https://wiki.osdev.org/Segmentation#Notes_Regarding_C which describes how and why the GDT is typically used by a "normal" OS to create a flat memory model, with a pair of code and data descriptors for each privilege level.

For a 32-bit Linux kernel, only gs gets a non-zero base for thread-local storage (so addressing modes like [gs: 0x10] will access a linear address that depends on the thread that executes it). Or in a 64-bit kernel (and 64-bit user-space), Linux uses fs. (Because x86-64 made GS special with the swapgs instruction, intended for use with syscall for the kernel to find the kernel stack.)

But anyway, the non-zero base for FS or GS are not from a GDT entry, they're set with the wrgsbase instruction. (Or on CPUs that don't support that, with a write to an MSR).

but what are those flags, namely 0xc09b, 0xa09b and so on ? I tend to believe they are the segments selectors

No, segment selectors are indices into the GDT. The kernel is defining the GDT as a C array, using designated-initializer syntax like [GDT_ENTRY_KERNEL32_CS] = initializer_for_that_selector.

(Actually the low 2 bits of a selector, i.e. segment register value, are the current privilege level. So GDT_ENTRY_DEFAULT_USER_CS should be `__USER_CS >> 2.)

mov ds, eax triggers the hardware to index the GDT, not linear search it for matching data in memory!

GDT data format:

You're looking at x86-64 Linux source code, so the kernel will be in long mode, not protected mode. We can tell because there are separate entries for USER_CS and USER32_CS. The 32-bit code segment descriptor will have its L bit cleared. The current CS segment description is what puts an x86-64 CPU into 32-bit compat mode vs. 64-bit long mode. To enter 32-bit user-space, an iret or sysret will set CS:RIP to a user-mode 32-bit segment selector.

I think you can also have the CPU in 16-bit compat mode (like compat mode not real mode, but the default operand-size and address size are 16). Linux doesn't do this, though.

Anyway, as explained in https://wiki.osdev.org/Global_Descriptor_Table and Segmentation,

Each segment descriptor contains the following information:

The base address of the segment

The default operation size in the segment (16-bit/32-bit)

The privilege level of the descriptor (Ring 0 -> Ring 3)

The granularity (Segment limit is in byte/4kb units)

The segment limit (The maximum legal offset within the segment)

The segment presence (Is it present or not)

The descriptor type (0 = system; 1 = code/data)

The segment type (Code/Data/Read/Write/Accessed/Conforming/Non-Conforming/Expand-Up/Expand-Down)

These are the extra bits. I'm not particularly interested in which bits are which because I (think I) understand the high level picture of what different GDT entries are for and what they do, without getting into the details of how that's actually encoded.

But if you check the x86 manuals or the osdev wiki, and the definitions for those init macros, you should find that they result in a GDT entry with the L bit set for 64-bit code segments, cleared for 32-bit code segments. And obviously the type (code vs. data) and privilege level differ.

Segmentation in Linux : Segmentation & Paging are redundant?

The 80x86 family of CPUs generate a real address by adding the contents of a CPU register called a segment register to that of the program counter. Thus by changing the segment register contents you can change the physical addresses that the program accesses. Paging does something similar by mapping the same virtual address to different real addresses. Linux using uses the latter - the segment registers for Linux processes will always have the same unchanging contents.

Linux: what prevents us from reading the memory from code segment?

I think your key misunderstanding is that you're assuming the 8086 hardware feature called "the data segment" is the same as the executable image subdivision also called "the data segment." Xenix may have used that hardware feature that way, but no modern x86 Unix does. On a modern Unix, %ds:0 always points to linear address zero, not to the beginning of the executable's data segment. (And similarly %cs:0 points to linear address zero, not to the executable's text segment.)

All of an executable's segments will be loaded into linear address space somewhere well above linear address 0, and on current-generation OSes the load addresses will be randomized on each run.

There's no standard way to get a pointer to the beginning of the executable's code or data segment. On GNU systems you can use dl_iterate_phdr, and other OSes may have similar functionality under a different name.

Memory segmentation in modern operating systems

The segment in "data segment" has nothing to do with hardware segmentation, which is a feature of little relevance to modern operating systems (i.e. redundant with respect to paging) which rely on paging to implement virtual memory. Segments also have severe drawbacks compared to paging (e.g. memory contiguous in a segment must be physically contiguous) without any benefit. By "segment" for user-space programs, one literally means a contiguous section of the virtual space of the process.

Many architectures do not have segmentation anymore. On x86, segmentation is just an historical payload and is set up to have a code and data segment that covers the entire address space because segmentation cannot be bypassed.

Your question about freeing memory obtained through sbrk is answered here: How do I free memory obtained by sbrk()?

How does ARM Linux maintain segments?

GDT/LDT is x86 family feature. Kernel space translated via kernel part of page tables, userspace via userspace part. Page tables are in main memory, mm_struct is a structure used in Linux kernel to describe memory layout. It is per-process

User stack
User heap
Bss segment
Data segment
Text segment

This layout described in mm_struct. Also mm_struct contains ->pgd field which is a root page table pointer (loaded to ttrb0/ttrb1 on ARM)

Linux Memory Segmentation