How to Disable or Change the Timeout Limit for the Gpu Under Linux

How to disable or change the timeout limit for the GPU under linux?

You can disable the watchdog by modifying your Xorg config (Option Interactive "0"). An example is available in the answer to this question: CUDA Visual Profiler 'Interactive' X config option?

OpenGL: render time limit on linux

I'm afraid this is not possible. After a lot of scouring through the documentation of both X and Wayland, I could not find anything mentioning GPU watchdog timer settings, so I believe this is driver-specific and likely inaccessible to the user (that or I am terrible at searching).

It is however possible to disable this watchdog under X on NVIDIA hardware by adding a line to your xorg.conf, which is then passed on to the graphics driver.

Option "Interactive" "boolean"

This option controls the behavior of the driver's watchdog, which attempts to detect and terminate GPU programs that get stuck, in order to ensure that the GPU remains available for other processes. GPU compute applications, however, often have long-running GPU programs, and killing them would be undesirable. If you are using GPU compute applications and they are getting prematurely terminated, try turning this option off.

Note that even the NVIDIA docs don't mention a numeric quantity for the timeout.

Disabling TDR for CUDA in Windows 8

Windows WDDM Driver Timeout Detection and Recovery mechanism can be disabled or the timeout can be extended to be greater than the default 2 seconds.Timeout Detection and Recovery is documented on MSDN.

(Edited: The above link is dead. The information that it provided might now be available at https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys)

Nsight Visual Studio Edition Nsight.Monitor has settings to disable or increase the timeout. Otherwise, you can use the registry keys in the MSDN article. Make sure to restart the computer after making changes.

I recommend that you increase TdrDelay before completely disabling TDR.

Tesla GPUs can use the Tesla Compute Cluster driver which does not have a timeout watchdog.

How do I select which GPU to run a job on?

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly.

To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using

export CUDA_VISIBLE_DEVICES=1

or

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.

If you want to specify more than one device, use

export CUDA_VISIBLE_DEVICES=0,1

or

CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable


Related Topics



Leave a reply



Submit