Where Is the Linux Isr Entry Point

Where is the Linux ISR Entry Point

CONFIG_X86_32

arch/x86/kernel/entry_32.S:system_call (INT $0x80)
arch/x86/kernel/entry_32.S:ia32_sysenter_target (SYSENTER)

CONFIG_X86_64

arch/x86/kernel/entry_64.S:system_call (SYSCALL, 64bit)

CONFIG_X86_64 and CONFIG_IA32_EMULATION

arch/x86/ia32/ia32entry.S:ia32_sysenter_target (SYSENTER)
arch/x86/ia32/ia32entry.S:ia32_cstar_target (SYSCALL, 32bit)
arch/x86/ia32/ia32entry.S:ia32_syscall (INT $0x80)

Location of interrupt handling code in Linux kernel for x86 architecture

I am looking for the code that pushes all of the general purpose registers on the stack

Hardware stores the current state (which includes registers) before executing an interrupt handler. Code is not involved. And when the interrupt exits, the hardware reads the state back from where it was stored.

Now, code inside the interrupt handler may read and write the saved copies of registers, causing different values to be restored as the interrupt exits. That's how a context switch works.

On x86, the hardware only saves those registers that change before the interrupt handler starts running. On most embedded architectures, the hardware saves all registers. The reason for the difference is that x86 has a huge number of registers, and saving and restoring any not modified by the interrupt handler would be a waste. So the interrupt handler is responsible to save and restore any registers it voluntarily uses.

See Intel® 64 and IA-32 Architectures
Software Developer’s Manual, starting on page 6-15.

kernel entry points on ARM

Inside each of these vector_* jump tables, there is exactly 1 DWORD with the address of a label with a _usr suffix.

This is correct. The table in indexed by the current mode. For instance, irq only has three entries; irq_usr, irq_svc, and irq_invalid. Irq's should be disabled during data aborts, FIQ and other modes. Linux will always transfer to svc mode after this brief 'vector stub' code. It is accomplished with,

@
@ Prepare for SVC32 mode.  IRQs remain disabled.
@
mrs r0, cpsr
eor r0, r0, #(\mode ^ SVC_MODE | PSR_ISETSTATE)
msr spsr_cxsf, r0

@@@ ... other unrelated code

movs    pc, lr          @ branch to handler in SVC mode

This is why irq_invalid is used for all other modes. Exceptions should never happen when this vector stub code is executing.

Does this mean that labels with the _usr suffix are executed, only if the interrupt arises when the kernel thread executing on that CPU is in userspace context? For instance, irq_usr is executed if the interrupt occurs when the kernel thread is in userspace context, dabt_usr is executed if the interrupt occurs when the kernel thread is in userspace context, and so on.

Yes, the spsr is the interrupted mode and the table indexes by these mode bits.

If 1 is true, then which kernel threads are responsible for handling, say irqs, with a different suffix such as irq_svc. I am assuming that this is the handler for an interrupt request that happens in SVC mode. If so, which kernel thread handles this? The kernel thread currently in SVC mode, on whichever CPU receives the interrupt?

I think you have some misunderstanding here. There is a 'kernel thread' for user space processes. The irq_usr is responsible for storing the user mode registers as a reschedule might take place. The context is different for irq_svc as a kernel stack was in use and it is the same one the IRQ code will use. What happens when a user task calls read()? It uses a system call and code executes in a kernel context. Each process has both a user and svc/kernel stack (and thread info). A kernel thread is a process without any user space stack.

If 2 is true, then at what point does the kernel thread finish processing the second interrupt, and return to where it had left off(also in SVC mode)? Is it ret_from_intr?

Generally Linux returns to the kernel thread that was interrupted so it can finish it's work. However, there is a configuration option for pre-empting svc threads/contexts. If the interrupt resulted in a reschedule event, then a process/context switch may result if CONFIG_PREEMPT is active. See svc_preempt for this code.

Linux Device Driver Program, where the program starts?

"Linux Device Driver" is a good book but it's old!

Basic example:

#include <linux/module.h>
#include <linux/version.h>
#include <linux/kernel.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Name and e-mail");
MODULE_DESCRIPTION("my_first_driver"); 

static int __init insert_mod(void)
{
    printk(KERN_INFO "Module constructor");
    return 0;
}

static void __exit remove_mod(void)
{
    printk(KERN_INFO "Module destructor");
}

module_init(insert_mod);
module_exit(remove_mod);

An up-to-date tutorial, really well written, is "Linux Device Drivers Series"

Where do the function pointers point to in struct proto_ops ?

For an IPv4 socket, the ops will be one of the structures inet_stream_ops, inet_dgram_ops, inet_sockraw_ops defined in net/ipv4/af_inet.c (unless you're using something weird like SCTP that has its own ops defined elsewhere). To see how those pointers get there, look at inet_create in the same file, which is called from __sock_create as pf->create().

To see how that pointer gets there, note that inet_create is the create member of a struct called inet_family_ops which is passed to sock_register(), causing it to be placed in the net_families array which is consulted when a socket is created.

Functions with register in their names are a good thing to look for when you want to see how different kernel components are connected.

Now back to getsockopt. If you look at where the getsockopt pointer actually goes, in all 3 of the inet_*_ops structs it points to sock_common_getsockopt for the getsockopt function.

sock_common_getsockopt in net/core/sock.c isn't very interesting. It just extracts a struct sock from a struct socket and then calls sk->sk_prot->getsockopt. So next you'll be wondering where that comes from.

It comes from the next layer up. I'm going to assume TCP as an example.

In net/ipv4/tcp_ipv4.c there's a struct proto tcp_prot in which the getsockopt member points to tcp_getsockopt, a function defined in net/ipv4/tcp.c. It implements SOL_TCP options itself, and for others, it calls the getsockopt method from another ops structure called icsk_af_ops. That brings us back to net/ipv4/tcp_ipv4.c where struct inet_connection_sock_af_ops ipv4_specific has getsockopt pointing to ip_getsockopt.

ip_getsockopt is in net/ipv4/ip_sockglue.c and it calls do_ip_getsockopt in the same file, which is where the SOL_IP options are actually handled.

How does the Linux kernel enter supervisor mode in x86?

It's not that simple. In x86, there are 4 different privilege levels: 0 (operating system kernel), 1, 2, and 3 (applications). Privilege levels 1 and 2 aren't used in Linux: the kernel runs at privilege level 0 while user space code runs at privilege level 3. The current privilege level (CPL) is stored in bits 0 and 1 of the CS (code segment) register.

There are multiple ways in which the transition from user to kernel can happen:

Through hardware interrupts: page faults, general protection faults, devices, hardware timer, and so on.
Through software interrupts: the int instruction raises a software interrupt. The most common in Linux is int 0x80, which is configured to be used for system calls from user space to kernel space.
Through specialized instructions like sysenter and syscall.

In any case, there is no actual code that does the transition: it is done by the processor itself, which switches from one privilege level to the other, and sets up segment selectors, instruction pointer, stack pointer and more according to the information that was set up by the kernel right after booting.

In the case of interrupts, the entries of the Interrupt Descriptor Table (IDT) are used. See this useful documentation page about interrupts in Linux which explains more about the IDT. If you want to get into the details, check out Chapter 5 of the Intel 64 and IA-32 architectures software developer's manual, Volume 3.

In short, each IDT entry specifies a descriptor privilege level (DPL) and a new code segment and offset. In case of software interrupts, some privilege level checks are made by the processor (one of which is CPL <= DPL) to determine whether the code that issued the interrupt has the privilege to do so. Then, the interrupt handler is executed, which implicitly sets the new CS register with the privilege level bits set to 0. This is how the canonical int 0x80 syscall for x86 32bit is made.

In case of specialized instructions like sysenter and syscall, the details differ, but the concept is similar: the CPU checks privileges and then retrieves the information from dedicated Model Specific Registers (MSR) that were previously set up by the kernel after boot.

For system calls the result is always the same: user code switches to privilege level 0 and starts executing kernel code, ending up right at the beginning of one of the different syscall entry points defined by the kernel.

Possible syscall entry points are:

entry_INT80_32 for 32-bit int 0x80
entry_INT80_compat for 32-bit int 0x80 on a 64-bit kernel
entry_SYSENTER_32 for 32-bit sysenter
entry_SYSENTER_compat for 32-bit sysenter on a 64-bit kernel
entry_SYSCALL_64 for 64-bit syscall
entry_SYSCALL_compat for 32-bit syscall on 64-bit kernel (special entry point which is not used by user code, in theory syscall is also a valid 32-bit instruction on AMD CPUs, but Linux only uses it for 64-bit because of its weird semantics)

How to access physical address during interrupt handler linux

Setting up a new memory mapping is an expensive operation, which typically requires calls to potentially blocking functions (e.g. grabbing locks). So your strategy has two problems:

Calling a blocking function is not possible in your context (there is no kernel thread associated with your interrupt handler, so there is no way for the kernel to resume it if it had to be put to sleep).
Setting up/tearing down a mapping per IRQ would be a bad idea performance-wise (even if we ignore the fact that it can't be done).

Typically, you would setup any mappings you need in a driver's probe() function (or in the module's init() if it's more of a singleton thing). This mapping is then kept in some private device data structure, which is passed as the last argument to some variant of request_irq(), so that the kernel then passes it back as the second argument to the IRQ handler.

Not sure what you mean by "need to map and cache a lot of unused area".

Depending on your particular system, you may end up consuming an entry in your CPU's MMU, or you may just re-use a broader mapping that was setup by whoever wrote the BSP. That's just the cost of doing business on a virtual memory system.

Caching is typically not enabled on I/O memory because of the many side-effects of both reads and writes. For the odd cases when you need it, you have to use ioremap_cached().

Where Is the Linux Isr Entry Point