Page Fault in Interrupt Context

Which stack does the page fault handler in an operating system run on

Linux doesn't care whether you're running a C or C++ program, really.

When the CPU detects a faulty address, it raises an interrupt. There's no reasonable way to use the user stack, as it may be in a totally corrupt state. The kernel has its own private stack for this kind of serious faults.

This isn't universally the case. If you normally call the kernel to do things for you, the kernel may assume that you have a reasonable stack available.

Assembly page fault handler cannot be called due to invalid stack pointer

How should I resolve this situation?

I'd resolve the situation by using avoidance - don't let kernel have a dodgy stack pointer in the first place (and don't let kernel stack be sent to swap space, don't use page fault for "auto-growing kernel stack", etc). Note that CPU will automatically switch to kernel stack if a page fault happens in user-space (at CPL=3) so it doesn't matter if user-space has a dodgy stack pointer.

Alternatives are:

  • force a kernel stack switch when kernel code (CPL=0) causes a page fault. This can be done using hardware task switch (protected mode) or the IST mechanism (long mode) for the page fault exception handler. This would be the best option for recovery (e.g. makes it easier to figure out what the problem was, fix it, then return).

  • force a kernel stack switch when kernel code (CPL=0) causes a double fault. This can be done using hardware task switch (protected mode) or the IST mechanism (long mode) for the double fault exception handler. This would be the best option for performance (no added overhead for normal page faults).

Note 1: Be warned that neither hardware task switching/task gates nor IST are re-entrant. For hardware task switching, if a second page fault occurs while you're handling the first page fault you'll get a general protection fault (because the "page fault task" is busy); and for IST, if a second page fault occurs while you're handling the first page fault the second page fault will trash/overwrite the first page fault's stack and make it impossible to recover. In theory, you can mitigate these problems by switching to a different task or different stack as soon as possible, but that's complicated/messy and likely to cause even more problems.

Note 2: You'll probably end up with a combination of avoidance and double fault using hardware task switch or IST; with the double fault handler doing "freeze system and dump info/panic" as a generic fallback for catastrophic kernel failures (that were supposed to be avoided but weren't).

Note 3: If you want to support "auto-growing kernel stacks"; you can use "stack probes" instead - basically, just do dummy read/s (in function epilogues) from "future stack" before using the memory for stack, so that the page fault occurs when there's still enough kernel stack left for the page fault handler.

What causes x86-64 Page Fault with only the Write bit set when a hardware interrupt happens while CPL=3

I have fixed the problem! As @sj95126 suggested, the problem was with my TSS, which had the IST entry set up only for the double fault handler (which makes sense looking back, because previously any interrupt or exception that happened in user mode caused a double fault, because as far as I can tell the CPU could not know which stack to activate). I have fixed it by setting the IST bits to that TSS offset for every interrupt handler, and now my code works flawlessly.

IDT:

IDT.breakpoint.set_handler_fn(interrupts::breakpoint::breakpoint_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.double_fault.set_handler_fn(interrupts::double_fault::double_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.page_fault.set_handler_fn(interrupts::page_fault::page_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.general_protection_fault.set_handler_fn(interrupts::general_protection_fault::general_protection_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.stack_segment_fault.set_handler_fn(interrupts::stack_segment_fault::stack_segment_fault_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.segment_not_present.set_handler_fn(interrupts::segment_not_present::segment_not_present_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.debug.set_handler_fn(interrupts::debug::debug_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT[interrupts::HardwareInterrupt::Timer.as_usize()].set_handler_fn(interrupts::timer::timer_handler).set_stack_index(INTERRUPT_IST_INDEX);
IDT.load();

What happens when a mov instruction causes a page fault with interrupts disabled on x86?

I've found the answer. My #2 suggestion was correct and the mechanism was right in front of my face. The page fault does happen, but the fixup_exception mechanism is used to provide a exception/continue mechanism. This section adds entries to the exception handler table:

    ".section __ex_table,\"a\"\n"               \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,6b\n" \
".previous" \

This says: if the IP address is the first entry and an exception is encountered in a fault handler, then set the IP address to the second address and continue.

So if the exception happens at "4:", jump to "5:". If the exception happens at "0:" then jump to "3:" and if the exception happens at "1:" jump to "6:".

The missing piece is in do_page_fault() in arch/x86/mm/fault.c:

/*
* If we're in an interrupt, have no user context or are running
* in an atomic region then we must not take the fault:
*/
if (unlikely(in_atomic() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}

in_atomic returned true because we are in a write_lock_bh() lock! bad_area_nosemaphore eventually does the fixup.

If a page_fault would occur (which was unlikely, because of the concept of the working space) then the function call would fail and jump out of the __copy_user macro, with the uncopied bytes set to size because preemption was disabled.



Related Topics



Leave a reply



Submit