Linux Kernel Device Driver to Dma from a Device into User-Space Memory

Linux kernel device driver to DMA from a device into user-space memory

I'm actually working on exactly the same thing right now and I'm going the ioctl() route. The general idea is for user space to allocate the buffer which will be used for the DMA transfer and an ioctl() will be used to pass the size and address of this buffer to the device driver. The driver will then use scatter-gather lists along with the streaming DMA API to transfer data directly to and from the device and user-space buffer.

The implementation strategy I'm using is that the ioctl() in the driver enters a loop that DMA's the userspace buffer in chunks of 256k (which is the hardware imposed limit for how many scatter/gather entries it can handle). This is isolated inside a function that blocks until each transfer is complete (see below). When all bytes are transfered or the incremental transfer function returns an error the ioctl() exits and returns to userspace

Pseudo code for the ioctl()

/*serialize all DMA transfers to/from the device*/
if (mutex_lock_interruptible( &device_ptr->mtx ) )
    return -EINTR;

chunk_data = (unsigned long) user_space_addr;
while( *transferred < total_bytes && !ret ) {
    chunk_bytes = total_bytes - *transferred;
    if (chunk_bytes > HW_DMA_MAX)
        chunk_bytes = HW_DMA_MAX; /* 256kb limit imposed by my device */
    ret = transfer_chunk(device_ptr, chunk_data, chunk_bytes, transferred);
    chunk_data += chunk_bytes;
    chunk_offset += chunk_bytes;
}

mutex_unlock(&device_ptr->mtx);

Pseudo code for incremental transfer function:

/*Assuming the userspace pointer is passed as an unsigned long, */
/*calculate the first,last, and number of pages being transferred via*/

first_page = (udata & PAGE_MASK) >> PAGE_SHIFT;
last_page = ((udata+nbytes-1) & PAGE_MASK) >> PAGE_SHIFT;
first_page_offset = udata & PAGE_MASK;
npages = last_page - first_page + 1;

/* Ensure that all userspace pages are locked in memory for the */
/* duration of the DMA transfer */

down_read(¤t->mm->mmap_sem);
ret = get_user_pages(current,
                     current->mm,
                     udata,
                     npages,
                     is_writing_to_userspace,
                     0,
                     &pages_array,
                     NULL);
up_read(¤t->mm->mmap_sem);

/* Map a scatter-gather list to point at the userspace pages */

/*first*/
sg_set_page(&sglist[0], pages_array[0], PAGE_SIZE - fp_offset, fp_offset);

/*middle*/
for(i=1; i < npages-1; i++)
    sg_set_page(&sglist[i], pages_array[i], PAGE_SIZE, 0);

/*last*/
if (npages > 1) {
    sg_set_page(&sglist[npages-1], pages_array[npages-1],
        nbytes - (PAGE_SIZE - fp_offset) - ((npages-2)*PAGE_SIZE), 0);
}

/* Do the hardware specific thing to give it the scatter-gather list
   and tell it to start the DMA transfer */

/* Wait for the DMA transfer to complete */
ret = wait_event_interruptible_timeout( &device_ptr->dma_wait, 
         &device_ptr->flag_dma_done, HZ*2 );

if (ret == 0)
    /* DMA operation timed out */
else if (ret == -ERESTARTSYS )
    /* DMA operation interrupted by signal */
else {
    /* DMA success */
    *transferred += nbytes;
    return 0;
}

The interrupt handler is exceptionally brief:

/* Do hardware specific thing to make the device happy */

/* Wake the thread waiting for this DMA operation to complete */
device_ptr->flag_dma_done = 1;
wake_up_interruptible(device_ptr->dma_wait);

Please note that this is just a general approach, I've been working on this driver for the last few weeks and have yet to actually test it... So please, don't treat this pseudo code as gospel and be sure to double check all logic and parameters ;-).

DMA transfer form kernel to user space

You probably want to implement mmap method of struct file_operations. Consider:

static int
sample_drv_mem_mmap(struct file *filep, struct vm_area_struct *vma)
{
    /*
     * Set your "dev" pointer here (the one you used
     * for dma_alloc_coherent() invocation)
     */
    struct device   *dev;

    /*
     * Set DMA address here (the one you obtained with
     * dma_alloc_coherent() via its third argument)
     */
    dma_addr_t  dma_addr;

    /* Set your DMA buffer size here */
    size_t      dma_size;

    /* Physical page frame number to be derived from "dma_addr" */
    unsigned long   pfn;

    /* Check the buffer size requested by the user */
    if (vma->vm_end - vma->vm_start > dma_size)
        return -EINVAL;

    /*
     * For the sake of simplicity, do not let the user specify an offset;
     * you may want to take care of that in later versions of your code
     */
    if (vma->vm_pgoff != 0)
        return -EINVAL;

    pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));

    return remap_pfn_range(vma, vma->vm_start, pfn,
                   vma->vm_end - vma->vm_start,
                   vma->vm_page_prot);
}

/* ... */

static const struct file_operations sample_drv_fops = {
    /* ... */

    .mmap =     sample_drv_mem_mmap,

    /* ... */
};

Long story short, the idea is to convert the DMA (bus) address you have to a kernel physical address and then use remap_pfn_range() to do the actual mapping between the kernel and the userland.

In the user application, one should invoke mmap() to request the mapping (instead of the read / write approach) For more information on that, please refer to man 2 mmap on your system.

Linux kernel device driver to DMA into kernel space

kmalloc is indeed one source to get the buffer. Another can be alloc_page with the GFP_DMA flag.
The meaning is that the memory that kmalloc returns is guaranteed to be contiguous in physical memory, not just virtual memory, so you can give the bus address of that pointer to your hardware. You do need to use dma_map_single() on the address returned which depending on exact platform might be no more then wrapper around virt_to_bus or might do more then do (set up IOMMU or GART tables)
Correct, just make sure to follow cache coherency guidelines as the DMA guide explains.
copy_to_user will work fine and is the easiest answer. Depending on your specific case it might be enough or you might need something with better performance. You cannot normaly map kmalloced addresses to user space, but you can DMA into user provided address (some caveats apply) or allocate user pages (alloc_page with GFP_USER)

Good luck!

Mapping device memory into user process address space

The same chapter you are referring to, has the answer to your question.

A definitive example of mmap usage can be seen by looking at a subset of the virtual memory areas for the X Window System server. Whenever the program reads or writes in the assigned address range, it is actually accessing the device. In the X server example, using mmap allows quick and easy access to the video card’s memory. For a performance-critical application like this, direct access makes a large difference.

...

Another typical example is a program controlling a PCI device. Most PCI peripherals map their control registers to a memory address, and a high-performance application might prefer to have direct access to the registers instead of repeatedly having to call ioctl to get its work
done.

But you are correct that usually kernel drivers handle devices without revealing device memory to user space:

As you might suspect, not every device lends itself to the mmap abstraction; it makes no sense, for instance, for serial ports and other stream-oriented devices. Another limitation of mmap is that mapping is PAGE_SIZE grained.

In the end, it all depends on how you want your device to be used from user space:

which interfaces you want to provide from driver to user space
what are performance requirements

Usually you hide device memory from user, but sometimes it's needed to give user a direct access to device memory (when alternative is bad performance or ugly interface). Only you, as an engineer, can decide which way is the best, in each particular case.

Linux Kernel Device Driver to Dma from a Device into User-Space Memory