Accessing Any Memory Locations Under Linux 2.6.X

Accessing any memory locations under Linux 2.6.x

If your program is running in user-mode, then memory outside of your process memory won't be accessible, by hook or by crook. Using asm will not help, nor will any other method. This is simply impossible, and is a core security/stability feature of any OS that runs in protected mode (i.e. all of them, for the past 20+ years). Here's a brief overview of Linux kernel memory management.

The only way you can explore the entire memory space of your computer is by using a kernel debugger, which will allow you to access any physical address. However, even that won't let you look at the memory of every process at the same time, since some processes will have been swapped out of main memory. Furthermore, even in kernel mode, physical addresses are not necessarily the same as the addresses visible to the process.

Kernel API to get Physical RAM Offset

Is there a function, macro, or constant which I can use from within my device driver that gives me the physical address of the first byte of System RAM?

It doesn't matter, because you're asking an XY question.

You should not be looking for or trying to use the "first byte of System RAM" in a device driver.

The driver only needs knowledge of the address (and length) of its register block (that is what this "memory" is for, isn't it?).

In 2.6 kernels (i.e. before Device Tree), this information was typically passed to drivers through struct resource and struct platform_device definitions in a board_devices.c file.

The IORESOURCE_MEM property in the struct resource is the mechanism to pass the device's memory block start and end addresses to the device driver.

The start address is typically hardcoded, and taken straight from the SoC datasheet or the board's memory map.

If you change the SoC, then you need new board file(s).

As an example, here's code from arch/arm/mach-at91/at91rm9200_devices.c to configure and setup the MMC devices for a eval board (AT91RM9200_BASE_MCI is the physical memory address of this device's register block):

#if defined(CONFIG_MMC_AT91) || defined(CONFIG_MMC_AT91_MODULE)
static u64 mmc_dmamask = DMA_BIT_MASK(32);
static struct at91_mmc_data mmc_data;

static struct resource mmc_resources[] = {
    [0] = {
        .start  = AT91RM9200_BASE_MCI,
        .end    = AT91RM9200_BASE_MCI + SZ_16K - 1,
        .flags  = IORESOURCE_MEM,
    },
    [1] = {
        .start  = AT91RM9200_ID_MCI,
        .end    = AT91RM9200_ID_MCI,
        .flags  = IORESOURCE_IRQ,
    },
};

static struct platform_device at91rm9200_mmc_device = {
    .name       = "at91_mci",
    .id     = -1,
    .dev        = {
                .dma_mask       = &mmc_dmamask,
                .coherent_dma_mask  = DMA_BIT_MASK(32),
                .platform_data      = &mmc_data,
    },
    .resource   = mmc_resources,
    .num_resources  = ARRAY_SIZE(mmc_resources),
};

void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data)
{
    if (!data)
        return;

    /* input/irq */
    if (data->det_pin) {
        at91_set_gpio_input(data->det_pin, 1);
        at91_set_deglitch(data->det_pin, 1);
    }
    if (data->wp_pin)
        at91_set_gpio_input(data->wp_pin, 1);
    if (data->vcc_pin)
        at91_set_gpio_output(data->vcc_pin, 0);

    /* CLK */
    at91_set_A_periph(AT91_PIN_PA27, 0);

    if (data->slot_b) {
        /* CMD */
        at91_set_B_periph(AT91_PIN_PA8, 1);

        /* DAT0, maybe DAT1..DAT3 */
        at91_set_B_periph(AT91_PIN_PA9, 1);
        if (data->wire4) {
            at91_set_B_periph(AT91_PIN_PA10, 1);
            at91_set_B_periph(AT91_PIN_PA11, 1);
            at91_set_B_periph(AT91_PIN_PA12, 1);
        }
    } else {
        /* CMD */
        at91_set_A_periph(AT91_PIN_PA28, 1);

        /* DAT0, maybe DAT1..DAT3 */
        at91_set_A_periph(AT91_PIN_PA29, 1);
        if (data->wire4) {
            at91_set_B_periph(AT91_PIN_PB3, 1);
            at91_set_B_periph(AT91_PIN_PB4, 1);
            at91_set_B_periph(AT91_PIN_PB5, 1);
        }
    }

    mmc_data = *data;
    platform_device_register(&at91rm9200_mmc_device);
}
#else
void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data) {}
#endif

ADDENDUM

i'm still not seeing how this is an xy question.

I consider it an XY question because:

You conflate "System RAM" with physical memory address space.

"RAM" would be actual (readable/writable) memory that exists in the address space.

"System memory" is the RAM that the Linux kernel manages (refer to your previous question).

Peripherals can have registers and/or device memory in (physical) memory address space, but this should not be called "System RAM".
You have not provided any background on how or why your driver "interacts directly with physical RAM using physical addresses." in a manner that is different from other Linux drivers.
You presume that a certain function is the solution for your driver, but you don't know the name of that function. That's a prototype for an XY question.

can't i call a function like get_platform_device (which i just made up) to get the struct platform_device and then find the struct resource that represents System RAM?

The device driver would call platform_get_resource() (in its probe function) to retrieve its struct resource that was defined in the board file.

To continue the example started above, the driver's probe routune has:

static int __init at91_mci_probe(struct platform_device *pdev)
{
    struct mmc_host *mmc;
    struct at91mci_host *host;
    struct resource *res;
    int ret;

    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
    if (!res)
        return -ENXIO;

    if (!request_mem_region(res->start, resource_size(res), DRIVER_NAME))
        return -EBUSY;

...  

    /*
     * Map I/O region
     */
    host->baseaddr = ioremap(res->start, resource_size(res));
    if (!host->baseaddr) {
        ret = -ENOMEM;
        goto fail1;
    }

that would allow me to write code that can always access the nth byte of RAM, without assumptions of how RAM is arranged in relation to other parts of memory.

That reads like a security hole or a potential bug.

I challenge you to to find a driver in the mainline Linux kernel that uses the "physical address of the first byte of System RAM".

Your title is "Kernel API to get Physical RAM Offset".

The API you are looking would seem to be the struct resource.

What you want to do seems to fly in the face of Linux kernel conventions. For the integrity and security of the system, drivers do not try to access any/every part of memory.

The driver will request and can be given exclusive access to the address space of its registers and/or device memory.

All RAM under kernel management is only accessed through well-defined conventions, such as buffer addresses for copy_to_user() or the DMA API.

A device driver simply does not have free reign to access any part of memory it chooses.

Once a driver is started by the kernel, there is absolutely no way it can disregard "assumptions of how RAM is arranged".

cross memory attach. How do I get the remote address from a child process to a parent process

The system calls process_vm_readv and process_vm_writev are meant for fast data transfer between processes. They are supposed to be used in addition to some traditional way of interprocess communication.

For example, you may use a regular pipe or fifo to transfer the required addresses between your processes. Then you may use those addresses to establish faster process_vm_ communication. The simpliest way to transfer something between forked processes should be the pipe() function (man 2 pipe has a good example of its usage). There are many other ways to do so of course, like using sockets or messages. You can even write an address to a file and let the other process read it.

simulating memfd_create on Linux 2.6

I considered two approaches:

Creating my temporary under /dev/shm/ rather than /tmp/
Using shm_open to get a file descriptor.

Although irrelevant to the specific problem at hand, /dev/shm/ is not guaranteed to exist on all distributions, so #2 felt more correct to me.

In order to not have to worry about unique names of the shared memory objects, I just generate UUIDs.

I think I'm happy with this.

Shout out to @NominalAnimal.

Purgeable Memory Regions on Linux

There is such similar system in 2.6.39-rc1. It is called "Transcendent memory":

Transcendent memory, 2009
Transcendent memory in a nutshell, 2011
Slides from 2011: Transcendent Memory
and Friends, TmemNotVirt-Linuxcon2011-Final.pdf (It also lists some similar terms)
Api specification from oss.oracle: Transcendent Memory Interface Specification
Version 0.0.1 - 081202

Update: There is also short into at wikipedia: https://en.wikipedia.org/wiki/Transcendent_memory

In computing, transcendent memory (aka "tmem") is a concept explored by Dan Magenheimer.
Transcendent memory is a class of memory that is of unknown and dynamically variable size, is addressable only indirectly by the kernel, can be configured either as persistent or as "ephemeral" (meaning it will be around for a while, but might disappear without warning), and is still fast enough to be synchronously accessible

One can think of transcendent memory as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again. At a first blush, it may seem like a relatively useless sort of device, but it is hoped that transcendent memory will be able to improve performance in a few situations.

Introduced in Linux kernel 2.6.39.[1][2] Implementation by Dan Magenheimer of Oracle Corporation. Xen 4.0 supports tmem in the hypervisor.

Is kernel space mapped into user space on Linux x86?

Actually, on 32-bit Windows, without the /3G boot option, the kernel is mapped at the top 2GB of linear address space, leaving 2GB for user process.

Linux does a similar thing, but it maps the kernel in the top 1GB of linear space, thus leaving 3GB for user process.

I don't know if you can peek the entire memory layout by just using the /proc filesystem. For a lab I designed for my students, I created a tiny device driver that allows a user to peek at an physical memory address, and get the contents of several control registers, such as CR3 (directory page base address).

By using these two operations, one can walk through the directory page of the current process (the one which is doing this operation) and see which pages are present, which ones are owned by the user and the kernel, or just by the kernel, which ones are read/write or read only, etc. With that information, they have to display a map showing memory usage, including kernel space.

Take a look at this PDF. It's the compiled version of all the labs we did in my course.
http://www.atc.us.es/asignaturas/tpbn/PracticasTPBN2011.pdf

On page 36 of PDF (page 30 of the document) you will see how a memory map looks like. This is the result of doing exercise #3.2 from lab #3.

The text is in spanish, but I'm sure you can use a translator or something like that if there are things you cannot understand. This labs assumes the student has previously read about how the paging system works and how to interpret the layout of the directory and page entries.

The map is like this. A 16x64 block. Each cell in the block represents 4MB of the current process virtual address space. The map should be tridimensional, as there are 4MB regions that are described by a page table with 1024 entries (pages), and not all pages may be present, but to keep the map clear, the exercise requires the user to collapse these regions, showing the contents of the first page entry that describes a present page, in the hope that all subsequents pages in that page table share the same attributes (which may or may not be actually true).

This map is used with kernels 2.6.X. in which PAE is not used, and PSE is used (PAE and PSE being two bit fields from control register CR4). PAE enables 2MB pages and PSE enables 4MB pages. 4KB pages are always available.

. : PDE not present, or page table empty.
X : 4MB page, supervisor.
R : 4MB page, user, read only.
* : 4MB page, user, read/write.
x : Page table with at least one entry describing a supervisor page.
r : Page table with at least one entry describing an user page, read only.
+ : Page table with at least one entry describing an user page, read/write.

................................r...............................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
...............................+..............................+.
xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXxX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..x...........................xx

You can see there is a vast space of 3GB of memory, almost empty in this case (the process is just a little C application, and uses less than 4MB, all contained in a page table, whose first present page is a read only page, assumed to be part of the program code, or maybe static strings).

Near the 3GB border there are two small regions read/write, which may belong to shared libraries loaded by the user program.

The last 4 rows (256 directory entries) belong almost all of them to the kernel. There are 224 entries which are actually being present and used. These maps the first 896MB of physical memory and it's the space in where the kernel lives. The last 32 entries are used by the kernel to access physical memory beyond the 896MB mark in systems with more than 896MB RAM.

Direct Memory Access in Linux

I think you can find a lot of documentation about the kmalloc + mmap part.
However, I am not sure that you can kmalloc so much memory in a contiguous way, and have it always at the same place. Sure, if everything is always the same, then you might get a constant address. However, each time you change the kernel code, you will get a different address, so I would not go with the kmalloc solution.

I think you should reserve some memory at boot time, ie reserve some physical memory so that is is not touched by the kernel. Then you can ioremap this memory which will give you
a kernel virtual address, and then you can mmap it and write a nice device driver.

This take us back to linux device drivers in PDF format. Have a look at chapter 15, it is describing this technique on page 443

Edit : ioremap and mmap.
I think this might be easier to debug doing things in two step : first get the ioremap
right, and test it using a character device operation, ie read/write. Once you know you can safely have access to the whole ioremapped memory using read / write, then you try to mmap the whole ioremapped range.

And if you get in trouble may be post another question about mmaping

Edit : remap_pfn_range
ioremap returns a virtual_adress, which you must convert to a pfn for remap_pfn_ranges.
Now, I don't understand exactly what a pfn (Page Frame Number) is, but I think you can get one calling

virt_to_phys(pt) >> PAGE_SHIFT

This probably is not the Right Way (tm) to do it, but you should try it

You should also check that FOO_MEM_OFFSET is the physical address of your RAM block. Ie before anything happens with the mmu, your memory is available at 0 in the memory map of your processor.

Accessing Any Memory Locations Under Linux 2.6.X