How Are the Fs/Gs Registers Used in Linux Amd64

How are the fs/gs registers used in Linux AMD64?

In x86-64 there are 3 TLS entries, two of them accesible via FS and GS, FS is used internally by glibc (in IA32 apparently FS is used by Wine and GS by glibc).

Glibc makes its TLS entry point to a struct pthread that contains some internal structures for threading. Glibc usually refers to a struct pthread variable as pd, presumably for pthread descriptor.

On x86-64, struct pthread starts with a tcbhead_t (this depends on the architecture, see the macros TLS_DTV_AT_TP and TLS_TCB_AT_TP). This Thread Control Block Header, AFAIU, contains some fields that are needed even when there is a single thread. The DTV is the Dynamic Thread Vector, and contains pointers to TLS blocks for DSOs loaded via dlopen(). Before or after the TCB there is a static TLS block for the executable and DSOs linked at (program's) load time. The TCB and DTV are explained pretty well in Ulrich Drepper's TLS document (look for the diagrams in chapter 3).

Where do FS and GS registers get mapped to in the linear address space?

In 64-bit mode, the FS and GS "segment registers" aren't really used, per se. But using an FS or GS segment override for a memory access causes the address to be offset by the value contained in a hidden FSBASE/GSBASE register which can be set by a special instruction (possibly privileged, in which case a system call can ask the kernel to do it). So for instance, if FSBASE = 0x12340000 and RDI = 0x56789, then MOV AL, FS:[RDI] will load from linear address 0x12396789.

This is still just a linear address - it's not separate from the rest of the process's address space in any way, and it's subject to all the same paging and memory protection as any other memory access. The program could get the exact same effect with MOV AL, [0x12396789] (since DS has a base of 0). It is up to the program's usual memory allocation mechanisms to allocate an appropriate block of memory and set FSBASE to point to that block, if it intends to use FS in this way. There are no special protections needed to avoid "corrupting" this memory, any more than they are needed to prevent a program from corrupting any other part of its own memory.

So it doesn't really provide new functionality - it's more a convenience for programmers and compilers. As you say, it's nice for something like a pointer to thread-local storage, but if we didn't have FS/GS, there are plenty of other ways we could keep such a pointer in the thread's state (say, reserving R15). It's true that there's not an obvious general need for two such registers; for the most part, the other one is just there in case someone comes up with a good way to use it in a particular program.

What is the FS/GS register intended for?

There is what they were intended for, and what they are used for by Windows and Linux.

The original intention behind the segment registers was to allow a program to access many different (large) segments of memory that were intended to be independent and part of a persistent virtual store. The idea was taken from the 1966 Multics operating system, that treated files as simply addressable memory segments. No BS "Open file, write record, close file", just "Store this value into that virtual data segment" with dirty page flushing.

Our current 2010 operating systems are a giant step backwards, which is why they are called "Eunuchs". You can only address your process space's single segment, giving a so-called "flat (IMHO dull) address space". The segment registers on the x86-32 machine can still be used for real segment registers, but nobody has bothered (Andy Grove, former Intel president, had a rather famous public fit last century when he figured out after all those Intel engineers spent energy and his money to implement this feature, that nobody was going to use it. Go, Andy!)

AMD in going to 64 bits decided they didn't care if they eliminated Multics as a choice (that's the charitable interpretation; the uncharitable one is they were clueless about Multics) and so disabled the general capability of segment registers in 64 bit mode. There was still a need for threads to access thread local store, and each thread needed a a pointer ... somewhere in the immediately accessible thread state (e.g, in the registers) ... to thread local store. Since Windows and Linux both used FS and GS (thanks Nick for the clarification) for this purpose in the 32 bit version, AMD decided to let the 64 bit segment registers (GS and FS) be used essentially only for this purpose (I think you can make them point anywhere in your process space; I don't know if the application code can load them or not). Intel in their panic to not lose market share to AMD on 64 bits, and Andy being retired, decided to just copy AMD's scheme.

It would have been architecturally prettier IMHO to make each thread's memory map have an absolute virtual address (e.g, 0-FFF say) that was its thread local storage (no [segment] register pointer needed!); I did this in an 8 bit OS back in the 1970s and it was extremely handy, like having another big stack of registers to work in.

So, the segment registers are now kind of like your appendix. They serve a vestigial purpose. To our collective loss.

Those that don't know history aren't doomed to repeat it; they're doomed to doing something dumber.

What does fs:0x30 provide in Linux?

The fs segment register is used in x86-64 Linux to point to thread-local storage. See How are the fs/gs registers used in Linux AMD64? So this instruction will xor the rdx register with the value found at offset 0x30 in the thread-local storage block.

This code is part of a pointer encryption mechanism in glibc to help harden against certain exploits. There is some explanation of it at https://sourceware.org/glibc/wiki/PointerEncryption. The value at fs:0x30 is an "key" for a trivial "encryption" algorithm; pointers are xor'ed with this value (and then rotated) when they are stored, and rotated back and xor'ed again when they are retrieved from memory, which recovers the original pointer.

There is no particular significance to the number 0x30; it just happens to be the offset where that value is stored. You can see in the inline assembly that this number comes from offsetof (tcbhead_t, pointer_guard); so the storage at the fs base address is laid out as a tcbhead_t struct, and given the other members that it contains, the pointer_guard member has ended up at offset 0x30. So looking at the name pointer_guard for the member is more informative than its numerical offset.

Use of gs register on a 32 bit program over a 64 bit linux

After the comment of @PeterCordes I searched in the "AMD64 Architecture Programmer's Manual, vol. 2", where in page 27 says:

Compatibility mode ignores the high 32 bits of base address in the FS and GS segment descriptors when calculating an effective address.

This implies that a 64-bit kernel managing a 32-bit process uses the MSR_*S_BASE registers as it would for a 64-bit process. The kernel can set the segment bases normally while in 64-bit long mode, so it doesn't matter whether or not those MSRs are available in 32-bit compatibility sub-mode of long mode, or in pure 32-bit protected mode (legacy mode, 32-bit kernel). A 64-bit Linux kernel only uses compat mode for ring 3 (user-space) where wrmsr and rdmsr aren't available because of privileges. As always, segment-base settings persist across changes to privilege level, like returning to user-space with sysret or iret.

Another thing that made me think that this registers weren't used for compatibility-mode processes was GDB. This is what happens when trying to print this register while debugging a 32-bit program.:

(gdb) i r $gs_base
Invalid register `gs_base'

Debugging a 64-bit program it works fine.

(gdb) i r $fs_base
fs_base        0x7ffff7d00c00      0x7ffff7d00c00

Since the instruction rdgsbase is a 64-bit instruction (trying to execute that opcode in a program 32-bit yields a SIGILL signal), it is a bit tricky to obtain the value of this registers within a 32-bit program.

The first solution I thought was to read it from a kernel module:

unsigned long gs_base = 0xdeadbeefc0ffee13;
asm("swapgs;"
    "rdgsbase %0;"    // maybe unsafe if an interrupt happens here
                      // be careful if using this for anything more than toy experiments.
    "swapgs;"
    : "=r"(gs_base));
printk("gs_base: 0x%016lx", gs_base);

So I created a driver for a device in /dev, so when a program open()s that file the code above is executed. After compiling and running a 32-bit program that opens this file I got this

[10793.682033] gs_base: 0x00000000f7f9f040

And using gdb to inspect 0xf7f9f040+0x14 I saw the canary, meaning that it was the TLS.

(gdb) x/wx 0xf7f9f040+0x14
0xf7f9f054: 0x21f03c00
(gdb) x/wx $ebp-0xc
0xbffff60c: 0x21f03c00

The other way I can think of is to perform a far call to change from 32-bit to 64-bit, execute rdgsbase and then return to 64-bit. Probably this is a better solution since it doesn't need a kernel module. (As long as you can assume you're running on a CPU that supports the FSGSBASE extension, and a new enough kernel to enable it.)

Something like this:

#include <stdio.h>

__attribute__((naked))   // or define the function in an asm statement at global scope
extern void rdgsbase()
{
    asm("rdgsbase %eax; retf");
}

int main()
{
    unsigned int* gs_base = NULL;
    unsigned int canary;
    // would be unsafe in a leaf function: clobbers the red zone
    asm("lcall $0x33, $rdgsbase; mov %%eax, %0" : "=m"(gs_base) : : "eax");
    asm("mov %%gs:0x14, %%eax ; mov %%eax, %0" : "=m"(canary) : : "eax");
    printf("gs_base = %p\n", gs_base);
    printf("canary: 0x%08x\n", canary);
    printf("canary: 0x%08x\n", gs_base[5]);
}

I know it is very very dirty and ugly, but it works.

$ gcc gs_base.c -o gs_base -m32
/usr/bin/ld: /tmp/ccAPoxwj.o: warning: relocation against `rdgsbase' in read-only section `.text'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE

$ ./gs_base 
gs_base = 0xf7f80040
canary: 0x59511d00
canary: 0x59511d00

In a 32-bit system the gs segment selector has the value 0x33, this points to the 7th entry in the GDT (index 6). So let's see what is in there.

Using the same module I shown in the OP (with only minor modifications) I printed the GDT used during the execution of a specific process. This is the entry with index 6:

[ 3579.535005] Entry #6:
[ 3579.535007]  Raw: 0xd100ffff
[ 3579.535009]  Base: 0xb7fcd100
[ 3579.535011]  Limit: 0xfffff
[ 3579.535013]  Flags: 0xd0f3
[ 3579.535018]      Type = 0x3 (Data, expand down, writable, accessed)
[ 3579.535019]      S    = 0 (user)
[ 3579.535021]      DPL  = 3
[ 3579.535023]      P    = 1 (present)
[ 3579.535025]      AVL  = 1
[ 3579.535027]      L    = 0
[ 3579.535028]      D/B  = 1
[ 3579.535030]      G    = 1 (KiB)

In gdb we can verify that it coincides with the TLS of said process:

(gdb) x/wx $ebp-0xc
0xbffff60c: 0xa6e29800
(gdb) x/wx 0xb7fcd100+0x14
0xb7fcd114: 0xa6e29800

Using strace we can see how the 32-bit glibc sets the gs on a 64-bit system:

set_thread_area({entry_number=-1, base_addr=0xf7ebb040, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=12)

This syscall performs in the kernel the setup of the MSR_GS_BASE with the value specified in the argument base_addr. The kernel also places the value 0x63 in the gs register, which points to the entry with index 12, a NULL entry.

On a 32-bit system the syscall is exactly the same

set_thread_area({entry_number=-1, base_addr=0xb7f66100, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=6)

But here, on a 32-bit kernel (which doesn't know anything about MSR_GS_BASE) the gs register gets the value 0x33, pointing to the index 6 in the GDT. Since there is no MSR_GS_BASE now is the GDT entry the one that is setup, with base address and limit fields (and rest of fields) equal to the ones specified in the arguments.

On the other hand, the 64-bit glibc uses the syscall arch_prctl(ARCH_SET_FS, 0x...) to set the value of MSR_FS_BASE. This syscall is only available for 64-bit programs.

The only thing that I don't quite understand yet is why set gs=0x63 instead of 0 or 0x2b (the value of ss, ds and es)...

How Are the Fs/Gs Registers Used in Linux Amd64