How to Make Built-In Containers (Sets, Dicts, Lists) Thread Safe

How to make built-in containers (sets, dicts, lists) thread safe?

You can use Python's metaprogramming facilities to accomplish this. (Note: written quickly and not thoroughly tested.) I prefer to use a class decorator.

I also think you may need to lock more than add and remove to make a set thread-safe, but I'm not sure. I'll ignore that problem and just concentrate on your question.

Also consider whether delegation (proxying) is a better fit than subclassing. Wrapping objects is the usual approach in Python.

Finally, there is no "magic wand" of metaprogramming that will magically add fine-grained locking to any mutable Python collection. The safest thing to do is to lock any method or attribute access using RLock, but this is very coarse-grained and slow and probably still not a guarantee that your object will be thread-safe in all cases. (For example, you may have a collection that manipulates another non-threadsafe object accessible to other threads.) You really do need to examine each and every data structure and think about what operations are atomic or require locks and which methods might call other methods using the same lock (i.e., deadlock itself).

That said, here are some techniques at your disposal in increasing order of abstraction:

Delegation

class LockProxy(object):
def __init__(self, obj):
self.__obj = obj
self.__lock = RLock()
# RLock because object methods may call own methods
def __getattr__(self, name):
def wrapped(*a, **k):
with self.__lock:
getattr(self.__obj, name)(*a, **k)
return wrapped

lockedset = LockProxy(set([1,2,3]))

Context manager

class LockedSet(set):
"""A set where add(), remove(), and 'in' operator are thread-safe"""

def __init__(self, *args, **kwargs):
self._lock = Lock()
super(LockedSet, self).__init__(*args, **kwargs)

def add(self, elem):
with self._lock:
super(LockedSet, self).add(elem)

def remove(self, elem):
with self._lock:
super(LockedSet, self).remove(elem)

def __contains__(self, elem):
with self._lock:
super(LockedSet, self).__contains__(elem)

Decorator

def locked_method(method):
"""Method decorator. Requires a lock object at self._lock"""
def newmethod(self, *args, **kwargs):
with self._lock:
return method(self, *args, **kwargs)
return newmethod

class DecoratorLockedSet(set):
def __init__(self, *args, **kwargs):
self._lock = Lock()
super(DecoratorLockedSet, self).__init__(*args, **kwargs)

@locked_method
def add(self, *args, **kwargs):
return super(DecoratorLockedSet, self).add(elem)

@locked_method
def remove(self, *args, **kwargs):
return super(DecoratorLockedSet, self).remove(elem)

Class Decorator

I think this is the cleanest and easiest-to-understand of the abstract methods, so I've expanded it to allow one to specify the methods to lock and a lock object factory.

def lock_class(methodnames, lockfactory):
return lambda cls: make_threadsafe(cls, methodnames, lockfactory)

def lock_method(method):
if getattr(method, '__is_locked', False):
raise TypeError("Method %r is already locked!" % method)
def locked_method(self, *arg, **kwarg):
with self._lock:
return method(self, *arg, **kwarg)
locked_method.__name__ = '%s(%s)' % ('lock_method', method.__name__)
locked_method.__is_locked = True
return locked_method

def make_threadsafe(cls, methodnames, lockfactory):
init = cls.__init__
def newinit(self, *arg, **kwarg):
init(self, *arg, **kwarg)
self._lock = lockfactory()
cls.__init__ = newinit

for methodname in methodnames:
oldmethod = getattr(cls, methodname)
newmethod = lock_method(oldmethod)
setattr(cls, methodname, newmethod)

return cls

@lock_class(['add','remove'], Lock)
class ClassDecoratorLockedSet(set):
@lock_method # if you double-lock a method, a TypeError is raised
def frobnify(self):
pass

Override Attribute access with __getattribute__

class AttrLockedSet(set):
def __init__(self, *args, **kwargs):
self._lock = Lock()
super(AttrLockedSet, self).__init__(*args, **kwargs)

def __getattribute__(self, name):
if name in ['add','remove']:
# note: makes a new callable object "lockedmethod" on every call
# best to add a layer of memoization
lock = self._lock
def lockedmethod(*args, **kwargs):
with lock:
return super(AttrLockedSet, self).__getattribute__(name)(*args, **kwargs)
return lockedmethod
else:
return super(AttrLockedSet, self).__getattribute__(name)

Dynamically-added wrapper methods with __new__

class NewLockedSet(set):
def __new__(cls, *args, **kwargs):
# modify the class by adding new unbound methods
# you could also attach a single __getattribute__ like above
for membername in ['add', 'remove']:
def scoper(membername=membername):
# You can also return the function or use a class
def lockedmethod(self, *args, **kwargs):
with self._lock:
m = getattr(super(NewLockedSet, self), membername)
return m(*args, **kwargs)
lockedmethod.__name__ = membername
setattr(cls, membername, lockedmethod)
self = super(NewLockedSet, cls).__new__(cls, *args, **kwargs)
self._lock = Lock()
return self

Dynamically-added wrapper methods with __metaclass__

def _lockname(classname):
return '_%s__%s' % (classname, 'lock')

class LockedClass(type):
def __new__(mcls, name, bases, dict_):
# we'll bind these after we add the methods
cls = None
def lockmethodfactory(methodname, lockattr):
def lockedmethod(self, *args, **kwargs):
with getattr(self, lockattr):
m = getattr(super(cls, self), methodname)
return m(*args,**kwargs)
lockedmethod.__name__ = methodname
return lockedmethod
lockattr = _lockname(name)
for methodname in ['add','remove']:
dict_[methodname] = lockmethodfactory(methodname, lockattr)
cls = type.__new__(mcls, name, bases, dict_)
return cls

def __call__(self, *args, **kwargs):
#self is a class--i.e. an "instance" of the LockedClass type
instance = super(LockedClass, self).__call__(*args, **kwargs)
setattr(instance, _lockname(self.__name__), Lock())
return instance

class MetaLockedSet(set):
__metaclass__ = LockedClass

Dynamically-created Metaclasses

def LockedClassMetaFactory(wrapmethods):
class LockedClass(type):
def __new__(mcls, name, bases, dict_):
# we'll bind these after we add the methods
cls = None
def lockmethodfactory(methodname, lockattr):
def lockedmethod(self, *args, **kwargs):
with getattr(self, lockattr):
m = getattr(super(cls, self), methodname)
return m(*args,**kwargs)
lockedmethod.__name__ = methodname
return lockedmethod
lockattr = _lockname(name)
for methodname in wrapmethods:
dict_[methodname] = lockmethodfactory(methodname, lockattr)
cls = type.__new__(mcls, name, bases, dict_)
return cls

def __call__(self, *args, **kwargs):
#self is a class--i.e. an "instance" of the LockedClass type
instance = super(LockedClass, self).__call__(*args, **kwargs)
setattr(instance, _lockname(self.__name__), Lock())
return instance
return LockedClass

class MetaFactoryLockedSet(set):
__metaclass__ = LockedClassMetaFactory(['add','remove'])

I'll bet using a simple, explicit try...finally doesn't look so bad now, right?

Exercise for the reader: let the caller pass in their own Lock() object (dependency injection) using any of these methods.

Are Python built-in containers thread-safe?

You need to implement your own locking for all shared variables that will be modified in Python. You don't have to worry about reading from the variables that won't be modified (ie, concurrent reads are ok), so immutable types (frozenset, tuple, str) are probably safe, but it wouldn't hurt. For things you're going to be changing - list, set, dict, and most other objects, you should have your own locking mechanism (while in-place operations are ok on most of these, threads can lead to super-nasty bugs - you might as well implement locking, it's pretty easy).

By the way, I don't know if you know this, but locking is very easy in Python - create a threading.lock object, and then you can acquire/release it like this:

import threading
list1Lock = threading.Lock()

with list1Lock:
# change or read from the list here
# continue doing other stuff (the lock is released when you leave the with block)

In Python 2.5, do from __future__ import with_statement; Python 2.4 and before don't have this, so you'll want to put the acquire()/release() calls in try:...finally: blocks:

import threading
list1Lock = threading.Lock()

try:
list1Lock.acquire()
# change or read from the list here
finally:
list1Lock.release()
# continue doing other stuff (the lock is released when you leave the with block)

Some very good information about thread synchronization in Python.

how do you create a sharable object in threading?

Threads share the same process space so the object that one thread sees is the same object another thread sees. The real question of sharability comes down to whether the operations you are performing on these objects are "thread safe". When you are executing Python code, the interpreter obtains the Global Interpreter Lock and no two threads can be executing Python byte code in parallel. But you could write code consisting of many Python statements that manipulate a data structure. If this code is running against the same data structure (object) concurrently in multiple threads, you could end up with a data structure that is corrupted since there is the possibility of a thread giving up control to the other thread in mid-computation. In this case you need to use locking to prevent this. On the other hand, if you have a built-in type, such as a list, and you do an append operation, that would be thread safe without your doing any explicit locking.

Is dict.update() thread-safe?

If the keys are compositions of builtin hashable types, generally "yes", .update() is thread-safe. In particular, for your example with integers keys, yes.

But in general, no. Looking up a key in a dict can invoke arbitrary user-defined Python code in user-supplied __hash__() and __eq__() methods, and those can do anything at all - including performing their own mutations on the dicts involved. As soon as the implementation invokes Python code, other threads can run too, including threads that may be mutating d1 and/or d2 too.

That's not a potential problem for the builtin hashable types (ints, strings, floats, tuples, ...) - their implementations to compute hash codes and decide equality are purely functional (deterministic and no side effects) and don't release the GIL (global interpreter lock).

That's all about CPython (the C implementation of Python). The answer may differ under other implementations! The Language Reference Manual is silent about this.

In Python, is set.pop() threadsafe?

If you look at the set.pop method in the CPython source you'll see that it doesn't release the GIL.

That means that only one set.pop will ever be happening at a time within a CPython process.

Since set.pop checks if the set is empty, you can't cause anything but an IndexError by trying to pop from an empty set.

So no, you can't corrupt the data by popping from a set in multiple threads with CPython.

Is ray thread safe?

Yes; by default, only one method will execute on a Ray actor at a time. Ordering from concurrent calls is not guaranteed.

With Ray 0.8, you'll be able to set ActorClass.options(max_concurrency=N) to override this serial execution guarantee.

Multiple threads writing to the same CSV in Python

I am not sure if csvwriter is thread-safe. The documentation doesn't specify, so to be safe, if multiple threads use the same object, you should protect the usage with a threading.Lock:

# create the lock
import threading
csv_writer_lock = threading.Lock()

def downloadThread(arguments......):
# pass csv_writer_lock somehow
# Note: use csv_writer_lock on *any* access
# Some code.....
with csv_writer_lock:
writer.writerow(re.split(',', line.decode()))

That being said, it may indeed be more elegant for the downloadThread to submit write tasks to an executor, instead of explicitly using locks like this.

Are nested Dictionaries thread safe from parent ConcurrentDictionary?

Of course not, they're different dictionaries with different access rules.

Your example however is fine, because you're accessing different dictionaries from different threads. If you were to do this instead:

outerDict[30]["a"] = "b"; // in thread 1
outerDict[30]["g"] = "h"; // in thread 2

You'd quickly run into issues.



Related Topics



Leave a reply



Submit