Linux - Mapping User Space Memory in Kernel Code

Linux - Mapping user space memory in kernel code

Found the answer

the key is to use the vmap function which create a mapping for a given page table. the problem was how to initialize a page table structure to a certain physical address but it appears there exists an API for that as well

here is an example to allocate a single page

void *virt_addr_ptr
struct page **my_page = kmalloc(sizeof (*my_page), GFP_KERNEL);
my_page = phys_to_page(phys_addr_ptr);
virt_addr_ptr = vmap(my_page, 1, VM_MAP, PAGE_KERNEL);

/*now its possible to access this space */
memcpy(store_buffer, virt_addr_ptr, store_size);

Is kernel space mapped into user space on Linux x86?

Actually, on 32-bit Windows, without the /3G boot option, the kernel is mapped at the top 2GB of linear address space, leaving 2GB for user process.

Linux does a similar thing, but it maps the kernel in the top 1GB of linear space, thus leaving 3GB for user process.

I don't know if you can peek the entire memory layout by just using the /proc filesystem. For a lab I designed for my students, I created a tiny device driver that allows a user to peek at an physical memory address, and get the contents of several control registers, such as CR3 (directory page base address).

By using these two operations, one can walk through the directory page of the current process (the one which is doing this operation) and see which pages are present, which ones are owned by the user and the kernel, or just by the kernel, which ones are read/write or read only, etc. With that information, they have to display a map showing memory usage, including kernel space.

Take a look at this PDF. It's the compiled version of all the labs we did in my course.
http://www.atc.us.es/asignaturas/tpbn/PracticasTPBN2011.pdf

On page 36 of PDF (page 30 of the document) you will see how a memory map looks like. This is the result of doing exercise #3.2 from lab #3.

The text is in spanish, but I'm sure you can use a translator or something like that if there are things you cannot understand. This labs assumes the student has previously read about how the paging system works and how to interpret the layout of the directory and page entries.

The map is like this. A 16x64 block. Each cell in the block represents 4MB of the current process virtual address space. The map should be tridimensional, as there are 4MB regions that are described by a page table with 1024 entries (pages), and not all pages may be present, but to keep the map clear, the exercise requires the user to collapse these regions, showing the contents of the first page entry that describes a present page, in the hope that all subsequents pages in that page table share the same attributes (which may or may not be actually true).

This map is used with kernels 2.6.X. in which PAE is not used, and PSE is used (PAE and PSE being two bit fields from control register CR4). PAE enables 2MB pages and PSE enables 4MB pages. 4KB pages are always available.

. : PDE not present, or page table empty.
X : 4MB page, supervisor.
R : 4MB page, user, read only.
* : 4MB page, user, read/write.
x : Page table with at least one entry describing a supervisor page.
r : Page table with at least one entry describing an user page, read only.
+ : Page table with at least one entry describing an user page, read/write.

................................r...............................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
...............................+..............................+.
xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXxX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..x...........................xx

You can see there is a vast space of 3GB of memory, almost empty in this case (the process is just a little C application, and uses less than 4MB, all contained in a page table, whose first present page is a read only page, assumed to be part of the program code, or maybe static strings).

Near the 3GB border there are two small regions read/write, which may belong to shared libraries loaded by the user program.

The last 4 rows (256 directory entries) belong almost all of them to the kernel. There are 224 entries which are actually being present and used. These maps the first 896MB of physical memory and it's the space in where the kernel lives. The last 32 entries are used by the kernel to access physical memory beyond the 896MB mark in systems with more than 896MB RAM.

How do you register a region of user space memory in a kernel module?

You can tell the kernel to restrict the address for kernel space by setting the mem parameter in the bootargs :

mem=1M@0x900000 --> instructs to use 1M starting from 0x900000

you can have multiple mem in boot args
example: mem=1M@0x900000 mem=1M@0xA00000

Following command should tell you the memory region allocated to the kernel:

cat /proc/iomem | grep System

How to mmap a Linux kernel buffer to user space?

The simplest way to map a set of pages from the kernel in your mmap method is to use the fault handler to map the pages. Basically you end up with something like:

static int my_mmap(struct file *filp, struct vm_area_struct *vma)
{
    vma->vm_ops = &my_vm_ops;
    return 0;
}

static const struct file_operations my_fops = {
    .owner  = THIS_MODULE,
    .open   = nonseekable_open,
    .mmap   = my_mmap,
    .llseek = no_llseek,
};

(where the other file operations are whatever your module needs). Also in my_mmap you do whatever range checking etc. is needed to validate the mmap parameters.

Then the vm_ops look like:

static int my_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
    vmf->page = my_page_at_index(vmf->pgoff);
    get_page(vmf->page);

    return 0;
} 

static const struct vm_operations_struct my_vm_ops = {
    .fault      = my_fault
}

where you just need to figure out for a given vma / vmf passed to your fault function which page to map into userspace. This depends on exactly how your module works. For example, if you did

my_buf = vmalloc_user(MY_BUF_SIZE);

then the page you use would be something like

vmalloc_to_page(my_buf + (vmf->pgoff << PAGE_SHIFT));

But you could easily create an array and allocate a page for each entry, use kmalloc, whatever.

[just noticed that my_fault is a slightly amusing name for a function]

How to access(if possible) kernel space from user space?

What are the different ways I can write in kernel address space from
user space?

I'm not sure if there're other methods, but you can access physical memory using /dev/mem & system call mmap().

/dev/mem is a character device file that is an image of the main
memory of the computer. It may be used, for example, to examine (and
even patch) the system.
Byte addresses in mem are interpreted as physical memory addresses.

more on /dev/mem: http://linux.about.com/library/cmd/blcmdl4_mem.htm

more on mmap(): http://linux.die.net/man/2/mmap

You can use the mmap() to map a section of /dev/mem and use in your user program. A brief example code:

#define MAPPED_SIZE //place the size here
#define DDR_RAM_PHYS  //place the physical address here

int _fdmem;
int *map = NULL;
const char memDevice[] = "/dev/mem";

/* open /dev/mem and error checking */
_fdmem = open( memDevice, O_RDWR | O_SYNC );

if (_fdmem < 0){
printf("Failed to open the /dev/mem !\n");
return 0;
}
else{
printf("open /dev/mem successfully !\n");
}

/* mmap() the opened /dev/mem */
map= (int *)(mmap(0,MAPPED_SIZE,PROT_READ|PROT_WRITE,MAP_SHARED,_fdmem,DDR_RAM_PHYS));

/* use 'map' pointer to access the mapped area! */
for (i=0,i<100;i++)
printf("content: 0x%x\n",*(map+i));

/* unmap the area & error checking */
if (munmap(map,MAPPED_SIZE)==-1){
perror("Error un-mmapping the file");
}

/* close the character device */
close(_fdmem);

However, please make sure the area you are mapping is not used, for example by the kernel, or it will make your system crash/hang, and you will be forced to reboot using hardware power button.

Hope it helps.

Allocating user space memory in Linux Kernel

is that allowed to allocate user space memory from kernel space? I know that process in Linux uses virtual memory and virtual addresses. And there is a protection which can`t allow using of memory of different process (it raises segmentation fault). So, there is no way to allocate a buffer and return a valid pointer to it to user space process?

Memory allocation routines have normally a return value, which is the pointer (in virtual memory coordinates) that the kernel has allocated the value. If the user is not asking for new memory, it's not normall to allocate memory for him in user space without telling where it has been allocated....

But you can do it. I think the best way to know how to do it is to study how the mmap(2) system call works (as it maps allocated memory to user space, and returns a user space pointer for the user to know where it has been allocated), and use the internal kernel function used to allocate user memory. The kernel malloc group of routines need to know how to map (even for the kernel virtual space, as the kernel also runs in a virtual address memory space) memory pages into virtual memory addresses. Not doing that leads to unaccessable memory.

Linux kernel space and user space

You've got the general idea mostly right, but make this adjustment: there's only one "kernelspace" for the whole machine, and all processes share it.

When a process is active, it can either be running in "user mode" or "kernel mode".

In user mode, the instructions being executed by the CPU are in the userspace side of the memory map. The program is running its own code, or code from a userspace library. In user mode, a process has limited abilities. There is a flag in the CPU which tells it not to allow the use of privileged instructions, and kernel memory, although it exists in the process's memory map, is inaccessible. (You wouldn't want let any program just read and write the kernel's memory - all security would be gone.)

When a process wants to do something other than move data around in its own (userspace) virtual memory, like open a file for example, it must make a syscall. Each CPU architecture has its own unique quirky method of making syscalls, but they all boil down to this: a magic instruction is executed, the CPU turns on the "privileged mode" flag, and jumps to a special address in kernelspace, the "syscall entry point".

Now the process is running in kernel mode. Instructions being executed are located in kernel memory, and they can read and write any memory they want to. The kernel examines the request that the process just made and decides what to do with it.

In the open example, the kernel receives 2 or 3 parameters corresponding to the arguments of int open(const char *filename, int flags[, int mode]). The first argument provides an example of when kernelspace needs access to userspace. You said open("foo", O_RDONLY) so the string "foo" is part of your program in userspace. The syscall mechanism only passed a pointer, not a string, so the kernel must read the string from user memory.

To find the requested file, the kernel may consult with filesystem drivers (to figure out where the file is) and block device drivers (to load the necessary blocks from disk) or network device drivers and protocols (to load the file from a remote source). All of those things are part of the kernel, i.e. in kernelspace, regardless of whether they are built-in or were loaded as modules.

If the request can't be satisfied immediately, the kernel may put the process to sleep. That means the process will be taken off the CPU until a response is received from the disk or network. Another process may get a chance to run now. Later, when the response comes in, your process starts running again (still in kernel mode). Now that it's found the file, the open syscall can finish up (check the permissions, create a file descriptor) and return to userspace.

Returning to userspace is a simple matter of putting the CPU back in non-privileged mode and restoring the registers to what they were before the user->kernel transition, with the instruction pointer pointing at the instruction after the magic syscall instruction.

Besides syscalls, there are other things that can cause a transition from user mode to kernel mode, including:

page faults - if your process accesses a virtual memory address that doesn't have a physical address assigned to it, the CPU enters kernel mode and jumps to the page fault handler. The kernel then decides whether the virtual address is valid or not, and it either creates a physical page and resumes the process in userspace where it left off, or sends a SIGSEGV.
interrupts - some hardware (network, disk, serial port, etc.) notifies the CPU that it requires attention. The CPU enters kernel mode and jumps to a handler, the kernel responds to it and then resumes the userspace process that was running before the interrupt.

Loading a module is done with a syscall that asks the kernel to copy the module's code and data into kernelspace and run its initialization code in kernel mode.

This is pretty long, so I'm stopping. I hope the walk-through focusing on user-kernel transitions has provided enough examples to solidify the idea.

Linux - Mapping User Space Memory in Kernel Code