Meaning of Inter_Op_Parallelism_Threads and Intra_Op_Parallelism_Threads

Understanding tensorflow inter/intra parallelism threads

  1. When both parameters are set to 1, there will be 1 thread running on 1 of the 4 cores. The core on which it runs might change but it will always be 1 at a time.

  2. When running something in parallel there is always a trade-off between lost time on communication and gained time through parallelization. Depending on the used hardware and the specific task (like the size of the matrices) the speedup will change. Sometimes running something in parallel will be even slower than using one core.

  3. For example when using floats on a cpu, (a + b) + c will not be equal to a + (b + c) because of the floating point precision. Using multiple parallel threads means that operations like a + b + c will not always be computed in the same order, leading to different results on each run. However those differences are extremely small and will not effect the overall result in most cases. Completely reproducible results are usually only needed for debugging. Enforcing complete reproducibility would slow down multi-threading a lot.

Should I set `inter_op_parallelism_threads` and `intra_op_parallelism_threads` to 1 when I use ray to create a actor?

It depends how many resources you want to the actor to use. If there is a dedicated machine for a given actor, and it's ok for the actor to use all of the resources on that machine, then use TensorFlow's default settings. If you are creating more like one actor per core, then setting inter_op_parallelism_threads and intra_op_parallelism_threads to small values like 1 or 2 is a good idea.

In general, you can try both approaches and see which is faster.

What is difference between Keras backend + Tensorflow and Keras from Tensorflow using CPU(in Tensorflow 2.x)

Not exactly, it's not as simple as that. As per official documentation -

intra_op_parallelism_threads - Certain operations like matrix multiplication and reductions can utilize parallel threads for speedups. A value of 0 means the system picks an appropriate number. Refer this

inter_op_parallelism_threads - Determines the number of parallel threads used by independent non-blocking operations. 0 means the system picks an appropriate number. Refer this

So technically you can not limit the number of CPUs but only the number of parallel threads, which, for the sake of limiting resource consumption, is sufficient.


Regarding the methods, you are using -

The third approach allows you to directly set the environment variables using os library.

import os

os.environ['TF_NUM_INTRAOP_THREADS'] = '2'
os.environ['TF_NUM_INTEROP_THREADS'] = '4'

The second approach is a method in tf2 that does exactly the same (sets environment variables), the difference being that Keras is packaged into tf2 now.

import tensorflow as tf
from tensorflow import keras

tf.config.threading.set_intra_op_parallelism_threads(2)
tf.config.threading.set_inter_op_parallelism_threads(4)

The first approach is for standalone Keras. This approach will work if keras is set to tensorflow backend. Again, it does the same thing which is set environment variables indirectly.

from keras import backend as K
import tensorflow as tf

config = tf.ConfigProto(intra_op_parallelism_threads=2, \
inter_op_parallelism_threads=4, \
allow_soft_placement=True, \
device_count = {'CPU': 1})
session = tf.Session(config=config)
K.set_session(session)

If you still have doubts, you can check what happens to the environment variables after running all 3 independently and then check the specific variable using os with -

print(os.environ.get('KEY_THAT_MIGHT_EXIST'))

For a better understanding of the topic, you can check this link that details it out quite well.


TLDR; You can use the second or third approach if you are working with tf2. Else use the first or third approach if you are using standalone Keras with tensorflow backend.



Related Topics



Leave a reply



Submit