Pci-E Memory Space Access with Mmap

PCI-e memory space access with mmap

mmap() is a very useful but casual way to access PCIe devices from user space.

I notice that you pass 0 as the first argument to mmap. In my case of an FPGA card plugged into an x86 computer I make a call to lspci to get the physical address of the card in the pcie slot. Then I use that physical address as the first argument to mmap. I know you are writing the BAR's in config space of the device but maybe double check with lspci.

$ sudo lspci -s 02:00 -v
02:00.0 Memory controller: Xilinx Corporation Device 8028
    Subsystem: Xilinx Corporation Device 0007
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at f7e00000 (32-bit, non-prefetchable) [size=1M]
    Capabilities: [80] Power Management version 3
    Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [c0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting

mmap to overlay VME bus into user space memory over a PCI?

Where does /dev/vme_m0 come from and what does it represent? It is hard to tell what opening and accessing it will do without knowing more.

You need to look at the bridge chip manual to figure out how a read/write to Region 1 will translate to a read/write on the VME bus. The bridge chip should have a set of registers that define PCI -> VME address translations. The VME address generated by accessing 0x80020000 would depend on the VME address specified in one of those registers.

mmap() device memory into user space

For a normal device what you have said is correct. If the GPU memory behaves differently for reads/write, they might do this. We should look at some documentation of cudaMemcpy().

From Nvidia's basics of CUDA page 22,

direction specifies locations (host or device) of src and dst
Blocks CPU thread: returns after the copy is complete.
Doesn't start copying until previous CUDA calls complete

It seems pretty clear that the cudaMemcpy() is synchronized to prior GPU registers writes, which may have caused the mmap() memory to be updated. As the GPU pipeline is a pipeline, prior command issues may not have completed when cudaMemcpy() is issued from the CPU.

Pci-E Memory Space Access with Mmap