How to Set Control Register 0 (Cr0) Bits in X86-64 Using Gcc Assembly on Linux

how to set control register 0 (cr0) bits in x86-64 using gcc assembly on linux

Ok, so finally I wrote the following kernel module. Am not sure it is right, since I don't observe the drastic slowdown which should accompany when you disable cache. But this compiles and inserts properly.

Any pointers will be helpful.

Thanks!

#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("Dual BSD/GPL");
static int hello_init(void)
{
        printk(KERN_ALERT "Hello, world\n");
        __asm__("push   %rax\n\t"
                "mov    %cr0,%rax;\n\t"
                "or     $(1 << 30),%rax;\n\t"
                "mov    %rax,%cr0;\n\t"
                "wbinvd\n\t"
                "pop    %rax"
);
        return 0;
}
static void hello_exit(void)
{
        printk(KERN_ALERT "Goodbye, cruel world\n");
        __asm__("push   %rax\n\t"
                "mov    %cr0,%rax;\n\t"
                "and     $~(1 << 30),%rax;\n\t"
                "mov    %rax,%cr0;\n\t"
                "wbinvd\n\t"
                "pop    %rax"
);
}
module_init(hello_init);
module_exit(hello_exit);

Trying to disable paging through cr0 register

TL:DR: This can't work, but your attempt didn't disable paging because you cleared bit 32 instead of bit 31. IDK why that would result in a SIGSEGV for any user-space process, though.

Any badness you get from this is from clobbering RAX + RBX without telling the compiler.

You're obviously building a module for x86-64 Linux which runs in long mode. But long mode requires paging to be enabled.

According to an osdev forum thread x86_64 - disabling paging?

If you disable paging in long mode, you will no longer be in long mode.

If that's actually true (rather than just trapping with a #GP exception or something), then obviously it's a complete disaster!!

Code fetch from EIP instead of RIP is extremely unlikely to fetch anything, and REX prefixes would decode as inc/dec if you do happen to end up with EIP pointing at some 64-bit code somewhere in the low 4GiB of physical address space. (Kernel addresses are in the upper canonical range, but it's remotely possible that the low 32 bits of RIP could be the physical address of some code.)

Also related: Why does long mode require paging - probably because supporting unpaged 64-bit mode is an unnecessary hardware expense that would never get much real use.

I'm not sure why you'd get a segfault. That's what I'd expect if you tried to run this code in user-space, where mov %cr0, %rax faults because it's privileged, and the kernel delivers SIGSEGV in response to that user-space #GP exception.

If you are running this function from an LKM's init function, like Brendan says the expected result would be crashing the kernel on that core. Or possibly the kernel would catch that and deliver SIGSEGV to modprobe(1).

Also, you're using GNU C Basic asm (without any clobbers), so GCC's code-gen assumes that registers (including RAX and RBX) aren't modified. Of course disabling paging is also a jump when your code isn't in an identity-mapped page, so it doesn't really matter whether make other small lies to the compiler or not. If this function doesn't inline into anything, then in practice clobbering RAX won't hurt. But clobbering RBX definitely can; it's call-preserved in the x86-64 System V calling convention.

And BTW, CR0 only has 32 significant bits. You could and $0x7fffffff, %eax to clear it. Or btr $31, %rax if you like to clear bit 31 in a 64-bit register. https://wiki.osdev.org/CPU_Registers_x86

According to Section 2.5 of the Intel manual Volume 3 (January 2019):

Bits 63:32 of CR0 and CR4 are reserved and must be written with zeros.
Writing a nonzero value to any of the upper 32 bits results in a
general-protection exception, #GP(0).

According to Section 3.1.1 of the AMD manual Volume 2 (December 2017):

In long mode, bits 63:32 are reserved and must be written with zero,
otherwise a #GP occurs.

So it would be fine to truncate RAX to EAX, at least for the foreseeable future. New stuff tends to get added to MSRs, not CR bits. Since there's no way to do this in Linux without crashing, you might as well just keep it simple for silly computer tricks.

0xFFFFFFFEFFFFFFFF clears bit 32, not bit 31

All of the above is predicated on the assumption that you were actually clearing the paging-enable bit. So maybe SIGSEGV is simply due to corrupting registers with GNU C basic asm without actually changing the control register at all.

https://wiki.osdev.org/CPU_Registers_x86 shows that Paging is bit 31 of CR0, and that there are no real bits in the high half. https://en.wikipedia.org/wiki/Control_register#CR0 says CR0 is a 64-bit register in long mode. (But there still aren't any bits that do anything in the high half.)

Your mask actually clears bit 32, the low bit of the high half. The right AND mask is 0x7FFFFFFF. Or btr $31, %eax. Truncating RAX to EAX is fine.

This will actually crash your kernel in long mode like you were trying to:

// disable paging, should crash
    asm volatile(
        "mov  %%cr0, %%rax        \n\t"   // assembles with no REX prefix, same as mov %cr0,%eax
        "btr  $31, %%eax          \n\t"   // reset (clear) bit 31
        "mov  %%rax, %%cr0        \n\t"
        ::
        : "rax", "memory"
     );

GCC inline assembly to manipulate all registers?

You can't directly manipulate the value of cs,ds,es,fs,gs,ss registers, to zero them out move a zero value from registers like eax, ebx, ecx, edx, esi, edi, esp:

asm("xor %eax, %eax");
asm("mov %ds, %eax");

Still if you compiled your binary to 64 bits, by using the E prefix, you are accessing only lower 32 bits of your registers. Upper half will not be zeroed up. To code 64 bit assembly use R prefixes, for example rax. Then if you for example tried to make your own C-style function purely in assembly with it's own call stack and used 32 bit registers and run it in 64 bit mode, it would crash.

You should definitely learn about using push and pop instructions and you should back up all registers you change in your C code before any inline assembly and then restore them after. If you won't your assembly will mess up with your C code and it will more or less randomly crash. For example:

// pack them up
asm("push %eax");
asm("push %ds");
// do something
asm("xor %eax, %eax");
asm("mov %ds, %eax");
// restore them back in opposite order (it's a stack)
asm("pop %ds");
asm("pop %eax");

See here

Next, you can't even move a value into CS:IP registers, but you can indirectly modify their values using jump, call and ret family of instructions, but you can't do that in your code because your code will jump at (random) (a zero if you zero the address out) address and it will crash.

See this link for more info on CS:IP:
Change CS:IP in Assembly Language

Fail to change CS register value from kernel mode. invalid opcode: 0000

There is no mov instruction to write to cs. From Intel® 64 and IA-32 architectures software developer’s manual, MOV spec:

The MOV instruction cannot be used to load the CS register. Attempting to do so results in an invalid opcode exception (#UD).

You need to do a far jump to change cs, check restrictions in ch.5.8 for changing cs.

Accessing values of process independent registers in c

The x86-64 architecture has quite a few such control registers. Most of them cannot be read without elevated privileges, those that can are marked. You might want to read this article for a detailed description on the bits in each register.

On Linux, you can obtain the relevant elevated privileges using the iopl system call. iopl(3) gives you all privileges you need.

The flags register contains information about recent arithmetic operations as well as some configuration. It can be read with the pushf instruction without special permissions. Read this article for more details.
The segment registers cs, ds, es, ss, fs, and gs contain segment selectors. On modern operating systems, these are usually fixed for all processes and can be read using mov r16,segr without elevated privileges.
The cr0 register contains configuration pertaining memory protection. Its low 16 bits can be read by any process using the rmsw instruction, the remaining bits can be read with elevated privileges using mov r32,cr0 as can all other control registers.
The cr2 register contains the address of the last page fault.
The cr3 register contains the address of the page directory.
The cr4 register contains additional CPU configuration.
The cr8 register contains information about task priority

There are also a bunch of model-specific registers which can be read with the rmsr instructions.

To read these registers, use inline assembly. Here is inline assembly for all registers previously mentioned. For reading the rflags register, also look at this question for some caveats.

/* read rflags */
uint64_t rflags;
asm("pushf; popf %0" : "=rm"(rflags));

/* read segment register, replace sr with the segment you want */
uint16_t seg;
asm("mov %sr,%0" : "=rm"(seg));

/* read low bits of cr0 */
/* on some CPUs, only the low 16 bits are correct,
/* on others all 32 bit are correct */
uint32_t cr0;
asm("smsw %0" : "=r"(cr0));

/* everything below here requires elevated privileges */

/* read control register, replace cr with register name */
uint64_t cr;
asm("mov %cr,%0" : "=rm"(cr));

/* read model specific register. msr contains register number */
uint32_t msr_no = 0xC0000080, msr_hi, msr_lo;
asm("rdmsr" : "=a"(msr_lo), "=d"(msr_hi) : "c"(msr_no));
uint64_t msr_val = (uint64_)msr_hi << 32 | msr_lo;

Linker error setting loading GDT register with LGDT instruction using Inline assembly

Your error:

kc.o: In function `k_enter_protected_mode':
kernel.c:(.text+0x1e1): undefined reference to `gdtr'

Is being generated because of this line of assembly code:

"lgdt (gdtr);"

gdtr is a memory operand and represents a label to a memory address where a GDT record can be found. You don't have such a structure defined with that name. That causes the undefined reference.

You need to create GDT record that contains the size and length of a GDT table. This record is what will get loaded into the GDT register by the LGDT instruction. You also haven't created a GDT table. gdtr should be a 6 byte structure consisting of the length of a GDT minus 1 (stored in a 16-bit word) and a 32-bit linear address where the GDT table can be found.

Rather than doing what you want in C I recommend just doing this in your assembly code prior to call k_main but after paging is set up.

Remove your k_enter_protected_mode function altogether in the C code. Then in the assembly file loader.asm place this code to load a new GDT at the start of your StartInHigherHalf code. So it would look like:

StartInHigherHalf:
    ; Set our own GDT, can't rely GDT register being valid after bootloader
    ; transfers control to our entry point
    lgdt [gdtr]         ; Load GDT Register with GDT record
    mov eax, DATA_SEG
    mov ds, eax         ; Reload all the data descriptors with Data selector (2nd argument)
    mov es, eax
    mov gs, eax
    mov fs, eax
    mov ss, eax

    jmp CODE_SEG:.setcs
                        ; Do the FAR JMP to next instruction to set CS with Code selector, and
                        ;    set the EIP (instruction pointer) to offset of setcs
.setcs:

The only thing left is to define the GDT table. A simple one with a required NULL descriptor and a flat 32-bit code and data descriptor can be placed in your .data section by changing it to this:

section .data
align 0x1000
BootPageDirectory:
    ; This page directory entry identity-maps the first 4MB of the 32-bit physical address space.
    ; All bits are clear except the following:
    ; bit 7: PS The kernel page is 4MB.
    ; bit 1: RW The kernel page is read/write.
    ; bit 0: P  The kernel page is present.
    ; This entry must be here -- otherwise the kernel will crash immediately after paging is
    ; enabled because it can't fetch the next instruction! It's ok to unmap this page later.
    dd 0x00000083
    times (KERNEL_PAGE_NUMBER - 1) dd 0                 ; Pages before kernel space.
    ; This page directory entry defines a 4MB page containing the kernel.
    dd 0x00000083
    times (1024 - KERNEL_PAGE_NUMBER - 1) dd 0  ; Pages after the kernel image.

; 32-bit GDT to replace one created by multiboot loader
; Per the multiboot specification we Can't rely on GDTR
; being valid so we need our own if we ever intend to
; reload any of the segment registers (this may be an
; issue with protected mode interrupts).
align 8
gdt_start:
    dd 0                ; null descriptor
    dd 0

gdt32_code:
    dw 0FFFFh           ; limit low
    dw 0                ; base low
    db 0                ; base middle
    db 10011010b        ; access
    db 11001111b        ; 32-bit size, 4kb granularity, limit 0xfffff pages
    db 0                ; base high

gdt32_data:
    dw 0FFFFh           ; limit low (Same as code)
    dw 0                ; base low
    db 0                ; base middle
    db 10010010b        ; access
    db 11001111b        ; 32-bit size, 4kb granularity, limit 0xfffff pages
    db 0                ; base high
end_of_gdt:

gdtr:
    dw end_of_gdt - gdt_start - 1
                        ; limit (Size of GDT - 1)
    dd gdt_start        ; base of GDT

CODE_SEG equ gdt32_code - gdt_start
DATA_SEG equ gdt32_data - gdt_start

We've now added the required GDT structure and created a record called gdtr that can be loaded with the LGDT instruction.

Since you are using OSDev as a resource, I recommend looking at the GDT tutorial for information on creating a GDT. The Intel manuals are also an excellent source of information.

Other Observations

Your loader.asm sets up a Multiboot header so it is a good bet you are using a Multiboot compliant bootloader. When you use a Multiboot compliant bootloader your CPU will be placed into 32-bit protected mode before it starts running your code starting at _loader. Your question suggests that you think you are in real mode, but you are actually already in protected mode. With a Mulitboot loader it isn't necessary to set CR0 bit 0 to a value of 1. It is guaranteed to already be 1 (set). In my code above I have removed it after setting up the GDT.

How to get bits of specific xmm registers?

If you really want to know about register values, rather than __m128i C variable values, I'd suggest using a debugger like GDB. print /x $xmm0.v2_int64 when stopped at a breakpoint.

Capturing a register at the top of a function is a pretty flaky and unreliable thing to try to attempt (smells like you've already gone down the wrong design path)¹. But you're on the right track with a register-asm local var. However, xmm0 can't match an "=r" constraint, only "=x". See Reading a register value into a C variable for more about using an empty asm template to tell the compiler you want a C variable to be what was in a register.

You do need the asm volatile("" : "=x"(var)); statement, though; GNU C register-asm local vars have no guarantees whatsoever except when used as operands to asm statements. (GCC will often keep your var in that register anyway, but IIRC clang won't.)

There's not a lot of guarantee about where this will be ordered wrt. other code (asm volatile may help some, or for stronger ordering also use a "memory" clobber). Also no guarantee that GCC won't use the register for something else first. (Especially a call-clobbered register like any xmm reg.) But it does at least happen to work in the version I tested.

print a __m128i variable shows how to print a __m128i as two 64-bit halves once you have it, or as other element sizes. The compiler will often optimize _mm_store_si128 / reload into shuffles, and this is for printing anyway so keep it simple.

Using a unsigned __int128 tmp; would also be an option in GNU C on x86-64.

#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>
#ifndef __cplusplus
#include <stdalign.h>
#endif

// If you need this, you're probably doing something wrong.
// There's no guarantee about what a compiler will have in XMM0 at any point
void foo() {
    register __m128i xmm0 __asm__("xmm0");
    __asm__ volatile ("" :"=x"(xmm0));

    alignas(16) uint64_t buf[2];
    _mm_store_si128((__m128i*)buf, xmm0);
    printf("%llu %llu\n", buf[1], buf[0]);   // I'd normally use hex, like %#llx
}

This prints the high half first (most significant), so reading left to right across both elements we get each byte in descending order of memory address within buf.

It compiles to the asm we want with both GCC and clang (Godbolt), not stepping on xmm0 before reading it.

# GCC10.2 -O3
foo:
        movhlps xmm1, xmm0
        movq    rdx, xmm0                 # low half -> RDX
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        movq    rsi, xmm1                 # high half -> RSI
        jmp     printf

Footnote 1:

If you make sure your function doesn't inline, you could take advantage of the calling convention to get the incoming values of xmm0..7 (for x86-64 System V), or xmm0..3 if you have no integer args (Windows x64).

__attribute__((noinline))
void foo(__m128i xmm0, __m128i xmm1, __m128i xmm2, etc.) {
  // do whatever you want with the xmm0..7 args
}

If you want to provide a different prototype for the function for callers to use (which omits the __m128i args), that can maybe work. It's of course Undefined Behaviour in ISO C, but if you truly stop inlining, the effects depend on the calling convention. As long as you make sure it's noinline so link-time optimization doesn't do cross-file inlining.

Of course, the mere fact of inserting a function call will change register allocation in the caller, so this only helps for a function you were going to call anyway.

How to Set Control Register 0 (Cr0) Bits in X86-64 Using Gcc Assembly on Linux