PCI-e memory space access with mmap
mmap() is a very useful but casual way to access PCIe devices from user space.
I notice that you pass 0 as the first argument to mmap. In my case of an FPGA card plugged into an x86 computer I make a call to lspci to get the physical address of the card in the pcie slot. Then I use that physical address as the first argument to mmap. I know you are writing the BAR's in config space of the device but maybe double check with lspci.
$ sudo lspci -s 02:00 -v
02:00.0 Memory controller: Xilinx Corporation Device 8028
Subsystem: Xilinx Corporation Device 0007
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f7e00000 (32-bit, non-prefetchable) [size=1M]
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [c0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
mmap to overlay VME bus into user space memory over a PCI?
Where does /dev/vme_m0 come from and what does it represent? It is hard to tell what opening and accessing it will do without knowing more.
You need to look at the bridge chip manual to figure out how a read/write to Region 1 will translate to a read/write on the VME bus. The bridge chip should have a set of registers that define PCI -> VME address translations. The VME address generated by accessing 0x80020000 would depend on the VME address specified in one of those registers.
mmap() device memory into user space
For a normal device what you have said is correct. If the GPU memory behaves differently for reads/write, they might do this. We should look at some documentation of cudaMemcpy()
.
From Nvidia's basics of CUDA page 22,
direction specifies locations (host or device) of src and dst
Blocks CPU thread: returns after the copy is complete.
Doesn't start copying until previous CUDA calls complete
It seems pretty clear that the cudaMemcpy()
is synchronized to prior GPU registers writes, which may have caused the mmap()
memory to be updated. As the GPU pipeline is a pipeline, prior command issues may not have completed when cudaMemcpy()
is issued from the CPU.
Related Topics
Why Disable One Local Interrupt or Preemption Can Cause The Whole System with 4 Cpus Unresponsive
How to Measure Net Used Disk Space Change Due to Activity by a Given Process in Linux
Mongodb (Result= Signal, Code = Killed, Signal = Ill
Nvcc Cuda Cross Compiling Cannot Find "-Lcudart"
Compiling and Linking a 32 Bit Application on Debian 64 Bit
Git Clone Using Ssh Failed in Windows Due to Permission Issue
Linux Support 802.1Ag and Y1731
Avoid Daemon Running in Dedicated CPU Cores
Using Winscp to Grab a File Through a Tunnel
How to Make Linux Ignore a Keyboard While Keeping It Available for My Program to Read
How to Get The Output of at Command in Current or Another Terminal Window
Logging Memory Access Footprint
Run Shell Script After Xserver Is Started
Combine Two Audio Files with a Command Line Tool
System Calls: Difference Between Sys_Exit(), Sys_Exit and Exit()