Where is the Linux ISR Entry Point
CONFIG_X86_32
- arch/x86/kernel/entry_32.S:system_call (INT $0x80)
- arch/x86/kernel/entry_32.S:ia32_sysenter_target (SYSENTER)
CONFIG_X86_64
- arch/x86/kernel/entry_64.S:system_call (SYSCALL, 64bit)
CONFIG_X86_64 and CONFIG_IA32_EMULATION
- arch/x86/ia32/ia32entry.S:ia32_sysenter_target (SYSENTER)
- arch/x86/ia32/ia32entry.S:ia32_cstar_target (SYSCALL, 32bit)
- arch/x86/ia32/ia32entry.S:ia32_syscall (INT $0x80)
Location of interrupt handling code in Linux kernel for x86 architecture
I am looking for the code that pushes all of the general purpose registers on the stack
Hardware stores the current state (which includes registers) before executing an interrupt handler. Code is not involved. And when the interrupt exits, the hardware reads the state back from where it was stored.
Now, code inside the interrupt handler may read and write the saved copies of registers, causing different values to be restored as the interrupt exits. That's how a context switch works.
On x86, the hardware only saves those registers that change before the interrupt handler starts running. On most embedded architectures, the hardware saves all registers. The reason for the difference is that x86 has a huge number of registers, and saving and restoring any not modified by the interrupt handler would be a waste. So the interrupt handler is responsible to save and restore any registers it voluntarily uses.
See Intel® 64 and IA-32 Architectures
Software Developer’s Manual, starting on page 6-15.
kernel entry points on ARM
Inside each of these vector_* jump tables, there is exactly 1
DWORD
with the address of a label with a_usr
suffix.
This is correct. The table in indexed by the current mode. For instance, irq
only has three entries; irq_usr
, irq_svc
, and irq_invalid
. Irq's should be disabled during data aborts, FIQ and other modes. Linux will always transfer to svc mode after this brief 'vector stub' code. It is accomplished with,
@
@ Prepare for SVC32 mode. IRQs remain disabled.
@
mrs r0, cpsr
eor r0, r0, #(\mode ^ SVC_MODE | PSR_ISETSTATE)
msr spsr_cxsf, r0
@@@ ... other unrelated code
movs pc, lr @ branch to handler in SVC mode
This is why irq_invalid
is used for all other modes. Exceptions should never happen when this vector stub code is executing.
Does this mean that labels with the _usr suffix are executed, only if the interrupt arises when the kernel thread executing on that CPU is in userspace context? For instance, irq_usr is executed if the interrupt occurs when the kernel thread is in userspace context, dabt_usr is executed if the interrupt occurs when the kernel thread is in userspace context, and so on.
Yes, the spsr
is the interrupted mode and the table indexes by these mode bits.
If 1 is true, then which kernel threads are responsible for handling, say irqs, with a different suffix such as irq_svc. I am assuming that this is the handler for an interrupt request that happens in SVC mode. If so, which kernel thread handles this? The kernel thread currently in SVC mode, on whichever CPU receives the interrupt?
I think you have some misunderstanding here. There is a 'kernel thread' for user space processes. The irq_usr
is responsible for storing the user mode registers as a reschedule might take place. The context is different for irq_svc
as a kernel stack was in use and it is the same one the IRQ code will use. What happens when a user task calls read()
? It uses a system call and code executes in a kernel context. Each process has both a user and svc/kernel stack (and thread info). A kernel thread is a process without any user space stack.
If 2 is true, then at what point does the kernel thread finish processing the second interrupt, and return to where it had left off(also in SVC mode)? Is it
ret_from_intr
?
Generally Linux returns to the kernel thread that was interrupted so it can finish it's work. However, there is a configuration option for pre-empting svc threads/contexts. If the interrupt resulted in a reschedule event, then a process/context switch may result if CONFIG_PREEMPT
is active. See svc_preempt
for this code.
See also:
- Linux kernel arm exception stack init
- Arm specific irq initialization
Linux Device Driver Program, where the program starts?
"Linux Device Driver" is a good book but it's old!
Basic example:
#include <linux/module.h>
#include <linux/version.h>
#include <linux/kernel.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Name and e-mail");
MODULE_DESCRIPTION("my_first_driver");
static int __init insert_mod(void)
{
printk(KERN_INFO "Module constructor");
return 0;
}
static void __exit remove_mod(void)
{
printk(KERN_INFO "Module destructor");
}
module_init(insert_mod);
module_exit(remove_mod);
An up-to-date tutorial, really well written, is "Linux Device Drivers Series"
Where do the function pointers point to in struct proto_ops ?
For an IPv4 socket, the ops will be one of the structures inet_stream_ops
, inet_dgram_ops
, inet_sockraw_ops
defined in net/ipv4/af_inet.c
(unless you're using something weird like SCTP that has its own ops defined elsewhere). To see how those pointers get there, look at inet_create
in the same file, which is called from __sock_create
as pf->create()
.
To see how that pointer gets there, note that inet_create
is the create
member of a struct called inet_family_ops
which is passed to sock_register()
, causing it to be placed in the net_families
array which is consulted when a socket is created.
Functions with register
in their names are a good thing to look for when you want to see how different kernel components are connected.
Now back to getsockopt
. If you look at where the getsockopt
pointer actually goes, in all 3 of the inet_*_ops
structs it points to sock_common_getsockopt
for the getsockopt
function.
sock_common_getsockopt
in net/core/sock.c
isn't very interesting. It just extracts a struct sock
from a struct socket
and then calls sk->sk_prot->getsockopt
. So next you'll be wondering where that comes from.
It comes from the next layer up. I'm going to assume TCP as an example.
In net/ipv4/tcp_ipv4.c
there's a struct proto tcp_prot
in which the getsockopt
member points to tcp_getsockopt
, a function defined in net/ipv4/tcp.c
. It implements SOL_TCP
options itself, and for others, it calls the getsockopt
method from another ops structure called icsk_af_ops
. That brings us back to net/ipv4/tcp_ipv4.c
where struct inet_connection_sock_af_ops ipv4_specific
has getsockopt
pointing to ip_getsockopt
.
ip_getsockopt
is in net/ipv4/ip_sockglue.c
and it calls do_ip_getsockopt
in the same file, which is where the SOL_IP
options are actually handled.
How does the Linux kernel enter supervisor mode in x86?
It's not that simple. In x86, there are 4 different privilege levels: 0 (operating system kernel), 1, 2, and 3 (applications). Privilege levels 1 and 2 aren't used in Linux: the kernel runs at privilege level 0 while user space code runs at privilege level 3. The current privilege level (CPL) is stored in bits 0 and 1 of the CS (code segment) register.
There are multiple ways in which the transition from user to kernel can happen:
- Through hardware interrupts: page faults, general protection faults, devices, hardware timer, and so on.
- Through software interrupts: the
int
instruction raises a software interrupt. The most common in Linux isint 0x80
, which is configured to be used for system calls from user space to kernel space. - Through specialized instructions like
sysenter
andsyscall
.
In any case, there is no actual code that does the transition: it is done by the processor itself, which switches from one privilege level to the other, and sets up segment selectors, instruction pointer, stack pointer and more according to the information that was set up by the kernel right after booting.
In the case of interrupts, the entries of the Interrupt Descriptor Table (IDT) are used. See this useful documentation page about interrupts in Linux which explains more about the IDT. If you want to get into the details, check out Chapter 5 of the Intel 64 and IA-32 architectures software developer's manual, Volume 3.
In short, each IDT entry specifies a descriptor privilege level (DPL) and a new code segment and offset. In case of software interrupts, some privilege level checks are made by the processor (one of which is CPL <= DPL) to determine whether the code that issued the interrupt has the privilege to do so. Then, the interrupt handler is executed, which implicitly sets the new CS register with the privilege level bits set to 0. This is how the canonical int 0x80
syscall for x86 32bit is made.
In case of specialized instructions like sysenter
and syscall
, the details differ, but the concept is similar: the CPU checks privileges and then retrieves the information from dedicated Model Specific Registers (MSR) that were previously set up by the kernel after boot.
For system calls the result is always the same: user code switches to privilege level 0 and starts executing kernel code, ending up right at the beginning of one of the different syscall entry points defined by the kernel.
Possible syscall entry points are:
entry_INT80_32
for 32-bitint 0x80
entry_INT80_compat
for 32-bitint 0x80
on a 64-bit kernelentry_SYSENTER_32
for 32-bitsysenter
entry_SYSENTER_compat
for 32-bitsysenter
on a 64-bit kernelentry_SYSCALL_64
for 64-bitsyscall
entry_SYSCALL_compat
for 32-bitsyscall
on 64-bit kernel (special entry point which is not used by user code, in theorysyscall
is also a valid 32-bit instruction on AMD CPUs, but Linux only uses it for 64-bit because of its weird semantics)
How to access physical address during interrupt handler linux
Setting up a new memory mapping is an expensive operation, which typically requires calls to potentially blocking functions (e.g. grabbing locks). So your strategy has two problems:
Calling a blocking function is not possible in your context (there is no kernel thread associated with your interrupt handler, so there is no way for the kernel to resume it if it had to be put to sleep).
Setting up/tearing down a mapping per IRQ would be a bad idea performance-wise (even if we ignore the fact that it can't be done).
Typically, you would setup any mappings you need in a driver's probe()
function (or in the module's init()
if it's more of a singleton thing). This mapping is then kept in some private device data structure, which is passed as the last argument to some variant of request_irq()
, so that the kernel then passes it back as the second argument to the IRQ handler.
Not sure what you mean by "need to map and cache a lot of unused area".
Depending on your particular system, you may end up consuming an entry in your CPU's MMU, or you may just re-use a broader mapping that was setup by whoever wrote the BSP. That's just the cost of doing business on a virtual memory system.
Caching is typically not enabled on I/O memory because of the many side-effects of both reads and writes. For the odd cases when you need it, you have to use ioremap_cached()
.
Related Topics
Qt Development on Linux Using Eclipse
Best Way to Set Environment Variables in Calling Shell
Crontab Is Not Working on Amazon Ec2 Server
(Master) at End of Terminal Prompt
Turning Multiple Lines into One Comma Separated Line
Execute a Shell Script Everyday at Specific Time
How Do Linux File Descriptor Limits Work
How to Change the Monitor Brightness on Linux
How to Automate Telnet Session Using Expect
Rename Part of File Name Based on Exact Match in Contents of Another File
How to Count the Number of Occurrences of a String in an Entire File
Is Kernel Space Mapped into User Space on Linux X86
-Bash: /Usr/Bin/Yum: /Usr/Bin/Python: Bad Interpreter: No Such File or Directory
Compile/Run Assembler in Linux
View a Log File in Linux Dynamically
How to Prepend a Directory the Library Path When Loading a Core File in Gdb on Linux