Linux Allocates Memory at Specific Physical Address

Linux allocates memory at specific physical address

If you can remap the translation on the fly, then you should work like any driver that uses DMA. Your basic reference for this is Chapter 15 of LDD3, plus the Linux DMA API.

What you are allocating is a DMA coherent buffer, via dma_alloc_coherent. On most platforms you should be able to pass in a null struct device pointer and get a generic DMA address. This will give you both a kernel virtual address to access the data, and a dma address which is the CPU physical address to map through your translation layer.

If your address translation is not very flexible, you may need to modify the platform code for your endpoint to reserve this buffer early on, in order to meet address alignment requirements. This is a bit more complicated, but there is an update of the bigphysarea patch to recent kernels that may help as a starting point.

Physical Memory Allocation in Kernel

You don't choose a memory location and put your data there. Instead, you ask the kernel to tell you the location of your data in physical memory, and tell the board to read that location. Each page of memory (4KB) will be at a different physical location, so if you are sending more data than that, your device likely supports "scatter gather" DMA, so it can read a sequence of pages at different locations in memory.

The API is this: dma_map_page() to return a value of type dma_addr_t, which you can give to the board. Then dma_unmap_page() when the transfer is finished. If you're doing scatter-gather, you'll put that value instead in the list of descriptors that you feed to the board. Again if scatter-gather is supported, dma_map_sg() and friends will help with this mapping of a large buffer into a set of pages. It's still your responsibility to set up the page descriptors in the format expected by your device.

This is all very well written up in Linux Device Drivers (Chapter 15), which is required reading. http://lwn.net/images/pdf/LDD3/ch15.pdf. Some of the APIs have changed from when the book was written, but the concepts remain the same.

Finally, mmap(): Sure, you can allocate a kernel buffer, mmap() it out to user space and fill it there, then dma_map that buffer for transmission to the device. This is in fact probably the cleanest way to avoid copy_from_user().

How does Linux allocate memory for its physical allocator?

What I don't get is how the kernel stores these structures, or how it allocates space for them. Since for order 0 pages you would need 1M entries (each is a 4KiB page), does it mean that the kernel allocates 1MiB * sizeof(struct page)?

Initialization of zones is done by calling paging_init() (arch/x86/mm/init_32.c; some descriptions - https://www.kernel.org/doc/gorman/html/understand/understand005.html 2.3 Zone Initialisation and http://repo.hackerzvoice.net/depot_madchat/ebooks/Mem_virtuelle/linux-mm/vminit.html Initializing the Kernel Page Tables) from setup_arch() via (native_pagetable_init() and indirect call 1166 x86_init.paging.pagetable_init();):

690 /*
691  * paging_init() sets up the page tables - note that the first 8MB are
692  * already mapped by head.S.
...*/
697 void __init paging_init(void)
698 {
699         pagetable_init();
...
711         zone_sizes_init();
712 }

pagetable_init() creates kernel page tables in swapper_pg_dir array of 1024 pgd_ts.

zone_sizes_init() actually defines zones of physical memory and calls free_area_init_nodes() to initialize them with actual work done (for each NUMA node for_each_online_node(nid) {...}) in free_area_init_node() which calls three functions:

calculate_node_totalpages() prints page counts for every node in dmesg
alloc_node_mem_map() does actual job of allocating struct page for every physical page in this node; memory for them is allocated by bootmem allocator doc1 doc2 (you can see its debug with bootmem_debug=1 kernel boot option):

4936 size = (end - start) * sizeof(struct page);

4937 map = alloc_remap(pgdat->node_id, size);

if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id);

free_area_init_core() (with filling of bitmaps in struct zone). Functionality of free_area_init_core described for older kernels in http://repo.hackerzvoice.net/depot_madchat/ebooks/Mem_virtuelle/linux-mm/zonealloc.html#INITIALIZE as:

free_area_init_core() The memory map is built, and the freelists and buddy bitmaps initialized, in free_area_init_core().

Free lists of orders in each zone are initialized and orders are marked as having no any free page: free_area_init_core() -> init_currently_empty_zone() -> zone_init_free_lists:

4147 static void __meminit zone_init_free_lists(struct zone *zone)
4148 {
4149         unsigned int order, t;
4150         for_each_migratetype_order(order, t) {
4151                 INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
4152                 zone->free_area[order].nr_free = 0;
4153         }
4154 }

PS: There is init() in kernel, it is called start_kernel(), and LXR (Linux cross-reference) will help you to navigate between functions (I posted links to lxr.free-electrons.com, but there are several online LXRs):

501 asmlinkage __visible void __init start_kernel(void)
...
528         boot_cpu_init();
529         page_address_init();
530         pr_notice("%s", linux_banner);
531         setup_arch(&command_line);

Linux Allocates Memory at Specific Physical Address