How to Get Current Available Gpus in Tensorflow

How to get current available GPUs in tensorflow?

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:

from tensorflow.python.client import device_lib

def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']

Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.

how to programmatically determine available GPU memory with tensorflow?

This code will return free GPU memory in MegaBytes for each GPU:

import subprocess as sp
import os

def get_gpu_memory():
command = "nvidia-smi --query-gpu=memory.free --format=csv"
memory_free_info = sp.check_output(command.split()).decode('ascii').split('\n')[:-1][1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
return memory_free_values

get_gpu_memory()

This answer relies on nvidia-smi being installed (which is pretty much always the case for Nvidia GPUs) and therefore is limited to NVidia GPUs.

Tensorflow identifying GPUs, but not recognizing them under the list of devices

The recommended way is to check if TensorFlow is using GPU is the following:

tf.config.list_physical_devices('GPU') 

Output:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

The following will also return the name of your GPU devices.

import tensorflow as tf
tf.test.gpu_device_name()

If a non-GPU version of the package is installed, the function would also return False. Use tf.test.is_built_with_cuda to validate if TensorFlow was build with CUDA support.

Note: tf.test.is_gpu_available is deprecated. Please refer here

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.config.list_physical_devices('GPU') instead.

Best way to test is to run code and check that GPU is using with nvidia-smi as mentioned by Matias Valdenegro or run simple code as below

import tensorflow as tf
with tf.device('/GPU:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

with tf.compat.v1.Session() as sess:
print (sess.run(c))

Output:

[[22. 28.]
[49. 64.]]

How to force tensorflow to use all available GPUs?

TL;DR: Use tf.distribute.MirroredStrategy() as a scope, like

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
[...create model as you would otherwise...]

If you do not specify any arguments, tf.distribute.MirroredStrategy() will use all available GPUs. You can also specify which ones to use if you want, like this: mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"]).

Refer to this Distributed training with TensorFlow guide for implementation details and other strategies.

Earlier answer (now outdated: deprecated, removed as of April 1, 2020.):
Use multi_gpu_model() from Keras. ()


TS;WM:

TensorFlow 2.0 now has the tf.distribute module, "a library for running a computation across multiple devices". It builds on the concept of "distribution strategies". You can specify the distribution strategy and then use it as a scope. TensorFlow will split the input, parallelize the calculations, and join the outputs for you basically transparently. Backpropagation is also subject to this. Since all processing is now done behind the scenes, you might want to familiarize yourself with the available strategies and their parameters as they might affect the speed of your training a lot. For example, do you want variables to reside on the CPU? Then use tf.distribute.experimental.CentralStorageStrategy(). Refer to the Distributed training with TensorFlow guide for more info.

Earlier answer (now outdated, leaving it here for reference):

From the Tensorflow Guide:

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default.

If you want to use multiple GPUs, unfortunately you have to manually specify what tensors to put on each GPU like

with tf.device('/device:GPU:2'):

More info in the Tensorflow Guide Using Multiple GPUs.

In terms of how to distribute your network over the multiple GPUs, there are two main ways of doing that.

  1. You distribute your network layer-wise over the GPUs. This is easier to implement but will not yield a lot of performance benefit because the GPUs will wait for each other to complete the operation.

  2. You create separate copies of your network, called "towers" on each GPU. When you feed the octuple network, you break up you input batch into 8 parts, and distribute them. Let the network forward propagate, then sum the gradients, and do the backward propagation. This will result in an almost-linear speedup with the number of GPUs. It's much more difficult to implement, however, because you also have to deal with complexities related to batch normalization, and very advisable to make sure you randomize your batch properly. There is a nice tutorial here. You should also review the Inception V3 code referenced there for ideas how to structure such a thing. Especially _tower_loss(), _average_gradients() and the part of train() starting with for i in range(FLAGS.num_gpus):.

In case you want to give Keras a try, it now has simplified multi-gpu training significantly with multi_gpu_model(). It can do all the heavy lifting for you.

Tensorflow doesn't seem to see my gpu

When I look up your GPU, I see that it only supports CUDA Compute Capability 2.1. (Can be checked through https://developer.nvidia.com/cuda-gpus) Unfortunately, TensorFlow needs a GPU with minimum CUDA Compute Capability 3.0.
https://www.tensorflow.org/get_started/os_setup#optional_install_cuda_gpus_on_linux

You might see some logs from TensorFlow checking your GPU, but ultimately the library will avoid using an unsupported GPU.

How to tell if tensorflow is using gpu acceleration from inside python shell?

No, I don't think "open CUDA library" is enough to tell, because different nodes of the graph may be on different devices.

When using tensorflow2:

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

For tensorflow1, to find out which device is used, you can enable log device placement like this:

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Check your console for this type of output.

Sample Image

How to set specific gpu in tensorflow?

I believe that you need to set CUDA_VISIBLE_DEVICES=1. Or which ever GPU you want to use. If you make only one GPU visible, you will refer to it as /gpu:0 in tensorflow regardless of what you set the environment variable to.

More info on that environment variable: https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

How to enable gpu in tensorflow 2

Could you please install TensorFlow using the command pip install tf-nightly-gpu and let us know if it works as it worked for me.

try below code snippet, it will return the flag True, if there are available GPUs

tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)



Related Topics



Leave a reply



Submit