"The Launch Timed Out and Was Terminated" Error with Bumblebee on Linux

The launch timed out and was terminated error with Bumblebee on Linux

The solution (found here) was to use the --no-xorg option for optirun, i.e.:

optirun --no-xorg [cuda-memcheck or cuda-gdb] ./my_program program_options

Indeed, the default behavior of optirun is to create a secondary X server which will then be subject to the driver's watchdog. By using the --no-xorg option, we can avoid the unnecessary consequences of this extra X server. This new option is available since Bumblebee 3.2.

It also provides a way to use cuda-gdb and avoid the following error:

fatal: All CUDA devices are used for display and cannot be used while
debugging. (error code = 24)

cudamemcpy error:the launch timed out and was terminated

This is exactly the same problem you asked about in this question. The kernel is getting terminated early by the driver because it is taking too long to finish. If you read the documentation for any of these runtime API functions you will see the following note:

Note:
Note that this function may also return error codes from previous,
asynchronous launches.

All that is happening is that the first API call after the kernel launch is returning the error incurred while the kernel was running - in this case the cudaMemcpy call. The way you can confirm this for yourself is to do something like this directly after the kernel launch:

// launch kernel
digi_calc <<<dimGrid, dimBlock >>> (sdev, avdev, adev, N, n, i);
std::string error = cudaGetErrorString(cudaPeekAtLastError());
printf("%s\n", error);
error = cudaGetErrorString(cudaThreadSynchronize());
printf("%s\n", error);

The cudaPeekAtLastError() call will show you if there are any errors in the kernel launch, and the error code returned by the cudaThreadSynchronize() call will show whether any errors were generated while the kernel was executing.

The solution is exactly as outlined in the previous question: probably the simplest way is redesign the code so it is "re-entrant" so you can split the work over several kernel launches, with each kernel launch safely under the display driver watchdog timer limit.

the launch timed out and was terminated

I assume you are running on Windows. If so, put the Tesla 2075 in TCC mode. This will allow compute access and Windows won't manage it like a display device, which will get rid of the watchdog timer. If you're having trouble locating nvidia-smi, just do a windows search for nvidia-smi.exe. (It should have been installed with the display driver.) Then, if the C2075 is the only CUDA GPU in the system, the command will be like this: nvidia-smi -g 0 -dm 1 You can also do nvidia-smi --help to get command line help for the tool. This will probably require a reboot of the system after you change this, to get the card into TCC mode.

If on the other hand you are running linux and X-windows on this machine, the solution is a little different. One approach is simply to disable X, e.g. by setting runlevel to 3 and rebooting, but there are other ways to do this. However you'll lose your X GUI on the other GPU (I assume you have another GPU since you said this is a non-display GPU). To preserve X and the GUI on the other GPU, it's necessary to modify your xorg.conf file to force X onto your display GPU and get it off of your compute (Tesla) GPU. The methods to do this will vary, but if you have 2 NVIDIA GPUs, (one for display) then the X display should be forced onto a single GPU using the BusID parameter in the relevant “Display” section of the xorg.conf file. In addition, any other “Display” sections should be deleted. For example:

 BusID “PCI:34:0:0”

The PCI IDs of the GPUs may be determined from the lspci command or from the nvidia-smi –a command.

You may also wish to refer to the X configuration options appendix of the NVIDIA driver README file.

PyCUDA clean-up error, CUDA launch timed out error, on some machines only

It was talonmies' comment to the question that lead me to the answer.

The issue was that one of the cards (the GTX 970) was at the same time used for graphical output of the system. As explained here and here, this implies that there is a "watchdog" preventing CUDA kernels to run longer than some maximum time before they are stopped.

The solution for me was to stop the X server by sudo service lightdm stop. Then, the program ran on both cards without error.

Debian bumblebee problems

After reading this Debian bug report I realized that I ought to install package libgl1-nvidia-glx. This fixed my problems, but I have to criticize Debian for this, because on their official Bumblebee webpage it is said to use this command to install:

sudo dpkg --add-architecture i386 && sudo apt-get update && sudo apt-get install bumblebee-nvidia primus primus-libs:i386 libgl1-nvidia-glx:i386

But this command is missing the crucial libgl1-nvidia-glx which needs to be installed together with its i386 counterpart in order for all apps to work!

Soo a quote to Debian:

While you are the best distribution on this planet and you seem to be the most stable, make sure to keep your official Wiki updated like Arch does! Only like this your users won't have negative experiances like I did and more will choose your distribution.

Xubuntu 12.04 display (with bumblebee) screwed up after upgrade

Have been getting the exact same issue and managed to get this working after ignoring it for a while. I'm also running 12.04 based Mint 13 64 bit LTS using backported packages and the ubuntu-xswat-ppa. My laptop is using the Nvidia 310M Optimus hybrid graphics (ASUS U43jc) - here's what I did to resolve:

sudo apt-get build-dep bumblebee bumblebee-nvidia
sudo apt-get build-dep nvidia-current
sudo apt-get install bumblebee --reinstall

After that checked the service status and finally running! Tried running:

optirun glxspheres

which ran successfully averaging ~106 frames/sec and ~99 Mpixels/sec at 1920x1080 on my external display.

Good luck!

Why CUDA kernel does not launch in the VS 2013 with CUDA 9.0

The problem was the CUDA Toolkit version. For the GeForce GT 720M, the Compute Capability is 2.1 and it can be used by the CUDA 8.0.

How can I set one NVIDIA graphics card for display and other for computing(in Linux)?

A general description of how to do this is given here. You want to use the option 1 which is excerpted below:

Option 1: Use Two GPUs (RECOMMENDED)

If two GPUs can be made available in the system, then X processing can be handled on one GPU while CUDA tasks are executed on the other. This allows full interactivity and no disturbance of X while simultaneously allowing unhindered CUDA execution.

In order to accomplish this:

•The X display should be forced onto a single GPU using the BusID parameter in the relevant "Device" section of the xorg.conf file. In addition, any other "Device" sections should be deleted. For example:

    BusID "PCI:34:0:0"

The PCI IDs of the GPUs may be determined from the lspci command or from the nvidia-smi -a command.

•CUDA processing should be forced onto the other GPU, for example by using the CUDA_VISIBLE_DEVICES environment variable before any CUDA applications are launched. For example:

    export CUDA_VISIBLE_DEVICES="1"

(Choose the numerical parameter to select the GPU that is not the X GPU)