Thread Local Storage in Python

Thread local storage in Python

Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing module creates a new sub-process for each, so any global will be thread local.

threading module

Here is a simple example:

import threading
from threading import current_thread

threadLocal = threading.local()

def hi():
initialized = getattr(threadLocal, 'initialized', None)
if initialized is None:
print("Nice to meet you", current_thread().name)
threadLocal.initialized = True
else:
print("Welcome back", current_thread().name)

hi(); hi()

This will print out:

Nice to meet you MainThread
Welcome back MainThread

One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call. The global or class level are ideal locations.

Here is why: threading.local() actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local() multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal variable (or whatever it is called), it gets its own private view of that variable.

This won't work as intended:

import threading
from threading import current_thread

def wont_work():
threadLocal = threading.local() #oops, this creates a new dict each time!
initialized = getattr(threadLocal, 'initialized', None)
if initialized is None:
print("First time for", current_thread().name)
threadLocal.initialized = True
else:
print("Welcome back", current_thread().name)

wont_work(); wont_work()

Will result in this output:

First time for MainThread
First time for MainThread

multiprocessing module

All global variables are thread local, since the multiprocessing module creates a new process for each thread.

Consider this example, where the processed counter is an example of thread local storage:

from multiprocessing import Pool
from random import random
from time import sleep
import os

processed=0

def f(x):
sleep(random())
global processed
processed += 1
print("Processed by %s: %s" % (os.getpid(), processed))
return x*x

if __name__ == '__main__':
pool = Pool(processes=4)
print(pool.map(f, range(10)))

It will output something like this:

Processed by 7636: 1
Processed by 9144: 1
Processed by 5252: 1
Processed by 7636: 2
Processed by 6248: 1
Processed by 5252: 2
Processed by 6248: 2
Processed by 9144: 2
Processed by 7636: 3
Processed by 5252: 3
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

... of course, the thread IDs and the counts for each and order will vary from run to run.

When to use thread local memory in Python?

threading.local() is for cases when you cannot or don't want to modify classes that implement threads.

In the above example you are in full control as you've created WorkerThread and you have started threads. So you know that you have an instance per running thread and you can store values in the instance that is bound to a thread. That's why your initial example worked. It works correctly in this regard.

But it is not always the case that you control threads. Sometimes threads are started by the library or framework and you only provide some code that will be run in these threads. In that case you cannot modify Thread classes and add thread specific variables to them.

Let's take an example of a multithreaded web server. You provide functions that are supposed to process incoming requests. You do not create all the infrastructure to listen on the socket, parse http request etc. All these activities are handled by the framework. It starts a pool of threads for you and when there's incoming request the framework parses it and invokes the handler you've provided using a thread from the pool.

In this case let's imagine you want to store some context for the request that is being processed (for example the currently logged in user) so that you can access it during request processing but do not need to pass it around in every function explicitly. You can't add this currentUser variable to a thread class as you don't have control over it. But you can use threading.local() to store it. And requests that are concurrently processed in multiple threads will have their own copies of that.

The same is applicable for your own creations. When the program becomes more complex and you need to separate infrastructure code (managing threads) from the logic of your application it may happen that you do not want to add a thread specific variables to thread classes and use threading.local() instead.

What is thread local storage in Python, and why do I need it?

In Python, everything is shared, except for function-local variables (because each function call gets its own set of locals, and threads are always separate function calls.) And even then, only the variables themselves (the names that refer to objects) are local to the function; objects themselves are always global, and anything can refer to them.
The Thread object for a particular thread is not a special object in this regard. If you store the Thread object somewhere all threads can access (like a global variable) then all threads can access that one Thread object. If you want to atomically modify anything that another thread has access to, you have to protect it with a lock. And all threads must of course share this very same lock, or it wouldn't be very effective.

If you want actual thread-local storage, that's where threading.local comes in. Attributes of threading.local are not shared between threads; each thread sees only the attributes it itself placed in there. If you're curious about its implementation, the source is in _threading_local.py in the standard library.

Django with Unicorn losing thread local storage during requst

Adding explicit monkey.patch_all() call somehow fixed the issue. The root of the problem has remained unknown.

Is storing data in thread local storage in a Django application safe, in cases of concurrent requests?

Yes, using thread-local storage in Django is safe.

Django uses one thread to handle each request. Django also uses thread-local data itself, for instance for storing the currently activated locale. While appservers such as Gunicorn and uwsgi can be configured to utilize multiple threads, each request will still be handled by a single thread.

However, there have been conflicting opinions on whether using thread-locals is an elegant and well-designed solution. The reasons against using thread-locals boil down to the same reasons why global variables are considered bad practice. This answer discusses a number of them.

Still, storing the request object in thread-local data has become a widely used pattern in the Django community. There is even an app Django-CRUM that contains a CurrentRequestUserMiddleware class and the functions get_current_user() and get_current_request().

Note that as of version 3.0, Django has started to implement asynchronous support. I'm not sure what its implications are for apps like Django-CRUM. For the foreseeable future, however, thread-locals can safely be used with Django.

python threading.local() in different module

In a similar situation I ended up doing the following in a separate module:

import threading
from collections import defaultdict

tls = defaultdict(dict)

def get_thread_ctx():
""" Get thread-local, global context"""
return tls[threading.get_ident()]

This essentially creates a global variable called tls. Then each thread (based on its identity) gets a key in the global dict. I handle that also as a dict. Example:

class Test(Thread):
def __init__(self):
super().__init__()
# note: we cannot initialize thread local here, since thread
# is not running yet

def run(self):
# Get thread context
tmp = get_thread_ctx()
# Create an app-specific entry
tmp["probe"] = {}
self.ctx = tmp["probe"]

while True:
...

Now, in a different module:

def get_thread_settings():
ctx = get_thread_ctx()
probe_ctx = ctx.get("probe", None)

# Get what you need from the app-specific region of this thread
return probe_ctx.get("settings", {})

Hope it helps the next one looking for something similar

Access thread local object in different module - Python

The problem with your code, is that you are not assigning your name to the correct local() context. Your __init__() method is run in the main thread, before you start your A and B threads by calling .start().

Your first thread creation A = Executor("A"); will create a new thread A but update the local context of the main thread. Then, when you start A by calling A.start(); you will enter A:s context, with a separate local context. Here name is not defined and you end up with None as output. The same then happens for B.

In other words, to access the thread local variables you should be running the current thread, which you are when running .start() (which will call your .run() method), but not when creating the objects (running __init__()).

To get your current code working, you could store the data in each object (using self references) and then, when each thread is running, copy the content to the thread local context:

import threading

threadLocal = threading.local()

def print_message():
name = getattr(threadLocal, 'name', None);
print name
return

class Executor (threading.Thread):
def __init__(self, name):
threading.Thread.__init__(self)
# Store name in object using self reference
self.name = name

def run(self):
# Here we copy from object to local context,
# since the thread is running
threadLocal.name = self.name
print_message();

A = Executor("A")
A.start()
B = Executor("B")
B.start()

Note, though, in this situation, it is somewhat of an overkill to use the thread local context, since we already store the separate data values in the different objects. To use it directly from the objects, would require a small rewrite of print_message() though.

Python 2.7: Thread local storage's instantiation when first accessed?

The first piece of information to consider is what is a thread-local? They are independently initialized instances of a particular type that are tied to a particular thread. With that in mind I would expect that some initialization code would be called multiple times. While in some languages like Java the initialization is more explicit, it does not necessarily need to be.

Let's look at the source for the supertype of the storage container you're using: https://github.com/python/cpython/blob/2.7/Lib/_threading_local.py

Line 186 contains the local type that is being used. Taking a look at that class you can see that the methods setattr and getattribute are among the overridden methods. Remember that in python these methods are called every time you attempt to assign a value or access a value in a type. The implementations of these methods acquire a local lock and then call the _patch method. This patch method creates a new dictionary and assigns it to the current instance dict (using object base to avoid infinite recursion: How is the __getattribute__ method used?)

So when you are calling storage.set(...) you are actually looking up a proxy dictionary in the local thread. If one doesn't exist the the init method is called on your type (see line 182). The result of that lookup is substituted in to the current instances dict method, and then the appropriate method is called on object to retrieve or set that value (l. 193,206,219) which uses the newly installed dict.



Related Topics



Leave a reply



Submit