Why Kernel Needs Virtual Addressing

Why does the kernel have a separate virtual address for a user page?

To access a page, it needs to be mapped in your current virtual address space.

So if the kernel wants to access a user page there are 2 solutions :

Map the page in our current address space, the kernel's address space, and make sure the two pages table entries stay consistent (you don't stricly have to keep it consistent, but you really want to).
Switch to an address space where that page is already mapped, the user's own address space

Your kernel seems to be picking option 1, which is a good thing for performance. Switching to another address space and back takes quite a lot time.
It could pick option 2 instead and switch to the user's address space every time it wants to access a user page, this would possibly make the code simpler by avoiding some bookkeeping, but that would be awfully slow.

Why is kernel said to be in process address space?

When the process makes a system call, we don't need to switch the page tables (from process address space page table to kernel address space page table) for servicing the system call (which should be done only in kernel mode). This is said to be that the kernel is running in the process context.

Some kernel events which won't run in process context will load the page tables only for kernel.

Got it ?

Why is kernel mapped to the same address space as processes?

A process "owns" the entire virtual address space here, the kernel and the user portions of it.

Its inability to peek and poke the kernel code and data is not due to different address spaces, it's due to different access rights/permissions set in the page tables. Kernel pages are set up in such a way that regular applications can't access them.

It is, however, customary to refer to the two parts of one whole thing as the kernel space and the user space and that can be confusing.

How are virtual addresses corresponding to kernel stack mapped?

Note: This is the OS agnostic answer. Details do vary slightly with OS in question (e.g. Darwin and continuations..), and possibly with architectural (ARMv8, x86, etc) implementations.

When a process performs a system call, the user mode state (registers) is saved, including the user mode stack pointer. At that point, a kernel mode stack pointer is loaded, which is usually maintained somewhere in the thread control block.

You are correct in saying that there is only one kernel space. What follows is, that (in theory) one thread in kernel space could easily see and/or tamper with any others in kernel space (just like same process threads can "see" each other in user space) This, however, is (almost always) in theory only, since the kernel code presumably respects memory boundaries (as is assumed user mode does, with thread local storage, etc). That said, "almost always", because if the kernel code can be exploited, then all of kernel memory will be laid bare to the exploiter, and potentially read and/or compromised.

why there are such large virtual addresses in a x86_64 kernel's memory layout

Your PC has not only DRAM but also ROM (nowadays, flash) and I/O memory. For backwards compatibility, parts of those must be mapped into the 20-bit and 32-bit address spaces, so the last part of the RAM ends up at some address above 0x400000000.

Process virtual address space and kernel address space? How?

Most systems define logical address ranges for kernel and user address spaces. On some systems the range is entirely up to the operating system (how it sets up the page tables) on others it is done in hardware.

For the former, page tables are usually nested. In which case multiple page tables share identical entires.

For the latter, there are usually separate page tables for the user and kernel address spaces.

If ELF defines virtual address space then does ELF also defines kernel virtual address space? How? [ I assume kernel virtual address space is dynamically mapped at run time?]

The executable file only defines the initial layout of the user address space.

If kernel address space is mapped to process address space then why doesn't process size(virtual) includes kernel size also?

That would depend upon the system and how it does the counting.

When and How this kernel address space is mapped/linked? Like , In case of shared library the particular file is pointed by vm struct etc.

The kernel address space exists independent of any processes. As mentioned above, it is mapped to a process either by having a system page table shared by all processes or a set of nested page table entries shared by all processes.

Does executable size determines process size (virtual) completely? in what context sizes differ or they are completely different.'

Not really. Large executables are indicative of larger ranges of logical addresses required. However, a small EXE can easily describe a large number of demand zero pages. In addition, the application can map logical pages as it executes. The EXE only defines the initial state of the logical address space.