Which stack does the page fault handler in an operating system run on
Linux doesn't care whether you're running a C or C++ program, really.
When the CPU detects a faulty address, it raises an interrupt. There's no reasonable way to use the user stack, as it may be in a totally corrupt state. The kernel has its own private stack for this kind of serious faults.
This isn't universally the case. If you normally call the kernel to do things for you, the kernel may assume that you have a reasonable stack available.
Assembly page fault handler cannot be called due to invalid stack pointer
How should I resolve this situation?
I'd resolve the situation by using avoidance - don't let kernel have a dodgy stack pointer in the first place (and don't let kernel stack be sent to swap space, don't use page fault for "auto-growing kernel stack", etc). Note that CPU will automatically switch to kernel stack if a page fault happens in user-space (at CPL=3) so it doesn't matter if user-space has a dodgy stack pointer.
Alternatives are:
force a kernel stack switch when kernel code (CPL=0) causes a page fault. This can be done using hardware task switch (protected mode) or the IST mechanism (long mode) for the page fault exception handler. This would be the best option for recovery (e.g. makes it easier to figure out what the problem was, fix it, then return).
force a kernel stack switch when kernel code (CPL=0) causes a double fault. This can be done using hardware task switch (protected mode) or the IST mechanism (long mode) for the double fault exception handler. This would be the best option for performance (no added overhead for normal page faults).
Note 1: Be warned that neither hardware task switching/task gates nor IST are re-entrant. For hardware task switching, if a second page fault occurs while you're handling the first page fault you'll get a general protection fault (because the "page fault task" is busy); and for IST, if a second page fault occurs while you're handling the first page fault the second page fault will trash/overwrite the first page fault's stack and make it impossible to recover. In theory, you can mitigate these problems by switching to a different task or different stack as soon as possible, but that's complicated/messy and likely to cause even more problems.
Note 2: You'll probably end up with a combination of avoidance and double fault using hardware task switch or IST; with the double fault handler doing "freeze system and dump info/panic" as a generic fallback for catastrophic kernel failures (that were supposed to be avoided but weren't).
Note 3: If you want to support "auto-growing kernel stacks"; you can use "stack probes" instead - basically, just do dummy read/s (in function epilogues) from "future stack" before using the memory for stack, so that the page fault occurs when there's still enough kernel stack left for the page fault handler.
What causes x86-64 Page Fault with only the Write bit set when a hardware interrupt happens while CPL=3
I have fixed the problem! As @sj95126 suggested, the problem was with my TSS, which had the IST entry set up only for the double fault handler (which makes sense looking back, because previously any interrupt or exception that happened in user mode caused a double fault, because as far as I can tell the CPU could not know which stack to activate). I have fixed it by setting the IST bits to that TSS offset for every interrupt handler, and now my code works flawlessly.
IDT:
IDT.breakpoint.set_handler_fn(interrupts::breakpoint::breakpoint_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.double_fault.set_handler_fn(interrupts::double_fault::double_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.page_fault.set_handler_fn(interrupts::page_fault::page_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.general_protection_fault.set_handler_fn(interrupts::general_protection_fault::general_protection_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.stack_segment_fault.set_handler_fn(interrupts::stack_segment_fault::stack_segment_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.segment_not_present.set_handler_fn(interrupts::segment_not_present::segment_not_present_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.debug.set_handler_fn(interrupts::debug::debug_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT[interrupts::HardwareInterrupt::Timer.as_usize()].set_handler_fn(interrupts::timer::timer_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.load();
What happens when a mov instruction causes a page fault with interrupts disabled on x86?
I've found the answer. My #2 suggestion was correct and the mechanism was right in front of my face. The page fault does happen, but the fixup_exception mechanism is used to provide a exception/continue mechanism. This section adds entries to the exception handler table:
".section __ex_table,\"a\"\n" \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,6b\n" \
".previous" \
This says: if the IP address is the first entry and an exception is encountered in a fault handler, then set the IP address to the second address and continue.
So if the exception happens at "4:", jump to "5:". If the exception happens at "0:" then jump to "3:" and if the exception happens at "1:" jump to "6:".
The missing piece is in do_page_fault() in arch/x86/mm/fault.c:
/*
* If we're in an interrupt, have no user context or are running
* in an atomic region then we must not take the fault:
*/
if (unlikely(in_atomic() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
in_atomic returned true because we are in a write_lock_bh() lock! bad_area_nosemaphore eventually does the fixup.
If a page_fault would occur (which was unlikely, because of the concept of the working space) then the function call would fail and jump out of the __copy_user macro, with the uncopied bytes set to size because preemption was disabled.
Related Topics
Prevent Git Checkout from Overwriting a File
How Does Ltrace() Display Rand()
How to Delete X Number of Files in a Directory
Reliable Bidirectional Communication to a Linux Process
Avoid Copying of Data Between User and Kernel Space and Vice-Versa
End Perl Script Without Waiting for System Call to Return
Julia: System Image File "Sys.Ji" Not Found
Display Hosts Alive with Fping
Shell Bash Script to Print Numbers in Ascending Order
Interpreting Openssl Speed Output for Rsa with Multi Option
Killing a Daemon Using a Pid File
Replace Key:Value from One File in Another File in Shellscript
How to Introspect Normal World from Secure World Using Trustzone
Jmp Unexpected Behavior in Shellcode When Next(Skipped) Instruction Is a Variable Definition
Sigbus While Doing Memcpy from Mmap Ed Buffer Which Is in Ram as Identified by Mincore