Linux Ia-32 Memory Model

Linux IA-32 memory model

I think this is more accurate:
alt text

Addressing mode in IA-32

Found this image from this power point presentation.

Addressing modes

This means that you can have addresses like [eax + ecx * 2 + 100]. You don't necessarily have to use all of these fields.

See also Referencing the contents of a memory location. (x86 addressing modes)

The scale factor is encoded into machine code as a 2-bit shift count. ESP can't be an index because of special cases for indicating the presence of a SIB byte and for a SIB byte with no index. See rbp not allowed as SIB base? for a rundown on the special cases.

Segmentation can be ignored in 32/64-bit mode under normal OSes like Linux.

The segment register is selected automatically depending on the base register in the addressing mode, or with segment override prefix (e.g. ds:, cs:).

But Linux uses a flat memory model so the segment base is always 0 for all segments (other than fs or gs, used for thread-local storage). The segment base is added to the "offset" calculated from base, index, scale and displacement to get the final linear address. So normally the "offset" part is the whole linear address.

That linear address is a virtual address, which the hardware translates to physical via the page tables / TLB (managed by the kernel).

Linux memory segmentation

Yes, Linux uses paging so all addresses are always virtual. (To access memory at a known physical address, Linux keeps all physical memory 1:1 mapped to a range of kernel virtual address space, so it can simply index into that "array" using the physical address as the offset. Modulo complications for 32-bit kernels on systems with more physical RAM than kernel address space.)

This linear address space constituted of pages, is split into four segments

No, Linux uses a flat memory model. The base and limit for all 4 of those segment descriptors are 0 and -1 (unlimited). i.e. they all fully overlap, covering the entire 32-bit virtual linear address space.

So the red part consists of two segments __KERNEL_CS and __KERNEL_DS

No, this is where you went wrong. x86 segment registers are not used for segmentation; they're x86 legacy baggage that's only used for CPU mode and privilege-level selection on x86-64. Instead of adding new mechanisms for that and dropping segments entirely for long mode, AMD just neutered segmentation in long mode (base fixed at 0 like everyone used in 32-bit mode anyway) and kept using segments only for machine-config purposes that are not particularly interesting unless you're actually writing code that switches to 32-bit mode or whatever.

(Except you can set a non-zero base for FS and/or GS, and Linux does so for thread-local storage. But this has nothing to do with how copy_from_user() is implemented, or anything. It only has to check that pointer value, not with reference to any segment or the CPL / RPL of a segment descriptor.)

In 32-bit legacy mode, it is possible to write a kernel that uses a segmented memory model, but none of the mainstream OSes actually did that. Some people wish that had become a thing, though, e.g. see this answer lamenting x86-64 making a Multics-style OS impossible. But this is not how Linux works.

Linux is a https://wiki.osdev.org/Higher_Half_Kernel, where kernel pointers have one range of values (the red part) and user-space addresses are in the green part. The kernel can simple dereference user-space addresses if the right user-space page-tables are mapped, it doesn't need to translate them or do anything with segments; this is what it means to have a flat memory model. (The kernel can use "user" page-table entries, but not vice versa). For x86-64 specifically, see https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt for the actual memory map.

The only reason those 4 GDT entries all need to be separate is for privilege-level reasons, and that the data vs. code segments descriptors have different formats. (A GDT entry contains more than just the base/limit; those are the parts that need to be different. See https://wiki.osdev.org/Global_Descriptor_Table)

And especially https://wiki.osdev.org/Segmentation#Notes_Regarding_C which describes how and why the GDT is typically used by a "normal" OS to create a flat memory model, with a pair of code and data descriptors for each privilege level.

For a 32-bit Linux kernel, only gs gets a non-zero base for thread-local storage (so addressing modes like [gs: 0x10] will access a linear address that depends on the thread that executes it). Or in a 64-bit kernel (and 64-bit user-space), Linux uses fs. (Because x86-64 made GS special with the swapgs instruction, intended for use with syscall for the kernel to find the kernel stack.)

But anyway, the non-zero base for FS or GS are not from a GDT entry, they're set with the wrgsbase instruction. (Or on CPUs that don't support that, with a write to an MSR).

but what are those flags, namely 0xc09b, 0xa09b and so on ? I tend to believe they are the segments selectors

No, segment selectors are indices into the GDT. The kernel is defining the GDT as a C array, using designated-initializer syntax like [GDT_ENTRY_KERNEL32_CS] = initializer_for_that_selector.

(Actually the low 2 bits of a selector, i.e. segment register value, are the current privilege level. So GDT_ENTRY_DEFAULT_USER_CS should be `__USER_CS >> 2.)

mov ds, eax triggers the hardware to index the GDT, not linear search it for matching data in memory!

GDT data format:

You're looking at x86-64 Linux source code, so the kernel will be in long mode, not protected mode. We can tell because there are separate entries for USER_CS and USER32_CS. The 32-bit code segment descriptor will have its L bit cleared. The current CS segment description is what puts an x86-64 CPU into 32-bit compat mode vs. 64-bit long mode. To enter 32-bit user-space, an iret or sysret will set CS:RIP to a user-mode 32-bit segment selector.

I think you can also have the CPU in 16-bit compat mode (like compat mode not real mode, but the default operand-size and address size are 16). Linux doesn't do this, though.

Anyway, as explained in https://wiki.osdev.org/Global_Descriptor_Table and Segmentation,

Each segment descriptor contains the following information:

The base address of the segment

The default operation size in the segment (16-bit/32-bit)

The privilege level of the descriptor (Ring 0 -> Ring 3)

The granularity (Segment limit is in byte/4kb units)

The segment limit (The maximum legal offset within the segment)

The segment presence (Is it present or not)

The descriptor type (0 = system; 1 = code/data)

The segment type (Code/Data/Read/Write/Accessed/Conforming/Non-Conforming/Expand-Up/Expand-Down)

These are the extra bits. I'm not particularly interested in which bits are which because I (think I) understand the high level picture of what different GDT entries are for and what they do, without getting into the details of how that's actually encoded.

But if you check the x86 manuals or the osdev wiki, and the definitions for those init macros, you should find that they result in a GDT entry with the L bit set for 64-bit code segments, cleared for 32-bit code segments. And obviously the type (code vs. data) and privilege level differ.

Assembly Segmented Model 32bit Memory Limit

Edit: My answer assumes that by "4GB limit" you are referring to the maximum size of linear (virtual) address space, rather than of physical address space. As explained in the comments below, the latter is not actually limited to 4GB at all - even when using a flat memory model.

Repeating your quote, with emphasis:

the logical address space consists
of as many as 16,383 segments of up to
4 gigabytes each

Now, quoting from "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture" (PDF available here):

Internally, all the segments that are
defined for a system are mapped into
the processor’s linear address space.

It is this linear address space which (on 32-bit processor) is limited to 4GB. So, a segmented memory model would still be subject to the limit.

Access to operands and operations from IA32 in Linux

Seems that you need to look up the AT&T syntax. Here is a short description of how to interpret AT&T syntax memory operands (under "AT&T Style Syntax").

Why is the page size of Linux (x86) 4 KB, how is that calculated?

The default page size is fixed by what the MMU (memory management unit) of the CPU supports. In 32-bit protected mode x86 supports two kinds of pages:

normal ones, 4 KiB
huge ones, 4 MiB

Not all x86 processors support large pages. One needs to have a CPU with Page Size Extension (PSE) capabilities. This excludes pre-Pentium processors. Virtually all current-generation x86 CPUs implements it.

4 KiB is widely popuplar page granularity in other architectures too. One could argue that this size comes from the division of a 32-bit virutal address into two 10-bit indexes in page directories/tables and the remaining 12 bits give the 4 KiB page size.

Linux Ia-32 Memory Model