Thread Safety in Python's Dictionary

Thread Safety in Python's dictionary

Python's built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations.

Your code should be safe. Keep in mind: a lock here will add almost no overhead, and will give you peace of mind.

https://web.archive.org/web/20201108091210/http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm has more details.

python dictionary is thread safe?

The two concepts are completely different. Thread safety means that two threads cannot modify the same object at the same time, thereby leaving the system in an inconsistent state.

That said, you cannot modify a dictionary while iterating over it. See the documentation..

The dictionary p should not be mutated during iteration. It is safe (since Python 2.1) to
modify the values of the keys as you iterate over the dictionary, but only so long as the
set of keys does not change.

Is dict.update() thread-safe?

If the keys are compositions of builtin hashable types, generally "yes", .update() is thread-safe. In particular, for your example with integers keys, yes.

But in general, no. Looking up a key in a dict can invoke arbitrary user-defined Python code in user-supplied __hash__() and __eq__() methods, and those can do anything at all - including performing their own mutations on the dicts involved. As soon as the implementation invokes Python code, other threads can run too, including threads that may be mutating d1 and/or d2 too.

That's not a potential problem for the builtin hashable types (ints, strings, floats, tuples, ...) - their implementations to compute hash codes and decide equality are purely functional (deterministic and no side effects) and don't release the GIL (global interpreter lock).

That's all about CPython (the C implementation of Python). The answer may differ under other implementations! The Language Reference Manual is silent about this.

Is it safe to use .copy() on a dictionary in a multi-threaded program?

I figured out the issue I was having. I was used a nested dictionary, and needed to use deepcopy to prevent the issue where different threads were wrongly modifying data, like so:

import copy

dictionary = {'test': {'test': 2}}

def test():
local_dict = copy.deepcopy(dictionary)
local_dict['test'] += 1
return local_dict

### multi-threaded logic calling test() function

Is list(dict.items()) thread-safe?

Short answer: it might be fine but use a lock anyway.

Using dis you can see that list(d.items()) is effectively two bytecode instructions (6 and 8):

>>> import dis
>>> dis.dis("list(d.items())")
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (d)
4 LOAD_METHOD 2 (items)
6 CALL_METHOD 0
8 CALL_FUNCTION 1
10 RETURN_VALUE

On the Python FAQ it says that (generally) things implemented in C are atomic (from the point of view of a running Python program):

What kinds of global value mutation are thread-safe?

In general, Python offers to switch among threads only between bytecode instructions; [...]. Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.

[...]

For example, the following operations are all atomic [...]

D.keys()

list() is implemented in C and d.items() is implemented in C so each should be atomic, unless they end up somehow calling out to Python code (which can happen if they call out to a dunder method that you overrode using a Python implementation) or if you're using a subclass of dict and not a real dict or if their C implementation releases the GIL. It's not a good idea to rely on them being atomic.

You mention that iter() will error if its underlying iterable changes size, but that's not relevant here because .keys(), .values() and .items() return a view object and those have no problem with the underlying object changing:

d = {"a": 1, "b": 2}
view = d.items()
print(list(view)) # [("a", 1), ("b", 2)]
d["c"] = 3 # this could happen in a different thread
print(list(view)) # [("a", 1), ("b", 2), ("c", 3)]

If you're modifying the dict in more than one instruction at a time, you'll sometimes get d in an inconsistent state where some of the modifications have been made and some haven't yet, but you shouldn't get a RuntimeError like you do with iter(), unless you modify it in a way that's non-atomic.

is a Python dictionary thread-safe when keys are thread IDs?

If data is a standard Python dictionary, the __getitem__ call is implemented entirely in C, as is the __hash__ method on the integer value returned by thread.get_ident(). At that point the data.__getitem__(<thread identifier>) call is thread safe. The same applies to writing to data; the data.__setitem__() call is entirely handled in C.

The moment any of these hooks are implemented in Python code, the GIL can be released between bytecodes and all bets are off.

This all makes the assumption you are using CPython; Jython, IronPython, PyPy and other python implementations may make different decisions on when to switch threads.

You'd be better of using the threading.local() mapping object instead, as that is guaranteed to provide you with a thread-local namespace. It only supports attribute access though.



Related Topics



Leave a reply



Submit