Why Is Using Thread Locals in Django Bad

Why is using thread locals in Django bad?

I disagree entirely. TLS is extremely useful. It should be used with care, just as globals should be used with care; but saying it shouldn't be used at all is just as ridiculous as saying globals should never be used.

For example, I store the currently active request in TLS. This makes it accessible from my logging class, without having to pass the request around through every single interface--including many that don't care about Django at all. It lets me make log entries from anywhere in the code; the logger outputs to a database table, and if a request happens to be active when a log is made, it logs things like the active user and what was being requested.

If you don't want one thread to have the capability of modifying another thread's TLS data, then set your TLS up to prohibit this, which probably requires using a native TLS class. I don't find that argument convincing, though; if an attacker can execute arbitrary Python code as your backend, your system is already fatally compromised--he could monkey patch anything to be run later as a different user, for example.

Obviously, you'll want to clear any TLS at the end of a request; in Django, that means clearing it in process_response and process_exception in a middleware class.

What is so bad with threadlocals

I don't think there is anything wrong with threadlocals - yes, it is a global variable, but besides that it's a normal tool. We use it just for this purpose (storing subdomain model in the context global to the current request from middleware) and it works perfectly.

So I say, use the right tool for the job, in this case threadlocals make your app much more elegant than passing subdomain model around in all the model methods (not mentioning the fact that it is even not always possible - when you are overriding django manager methods to limit queries by subdomain, you have no way to pass anything extra to get_query_set, for example - so threadlocals is the natural and only answer).

Is storing data in thread local storage in a Django application safe, in cases of concurrent requests?

Yes, using thread-local storage in Django is safe.

Django uses one thread to handle each request. Django also uses thread-local data itself, for instance for storing the currently activated locale. While appservers such as Gunicorn and uwsgi can be configured to utilize multiple threads, each request will still be handled by a single thread.

However, there have been conflicting opinions on whether using thread-locals is an elegant and well-designed solution. The reasons against using thread-locals boil down to the same reasons why global variables are considered bad practice. This answer discusses a number of them.

Still, storing the request object in thread-local data has become a widely used pattern in the Django community. There is even an app Django-CRUM that contains a CurrentRequestUserMiddleware class and the functions get_current_user() and get_current_request().

Note that as of version 3.0, Django has started to implement asynchronous support. I'm not sure what its implications are for apps like Django-CRUM. For the foreseeable future, however, thread-locals can safely be used with Django.

Django - Troubleshooting thread locals & middleware issue with DRF ViewSets in Production

How strange. After some time I had an idea that resolved the issue.

Still not exactly sure of the root cause (would love any insight why it worked just fine in dev, but not in prod until the changes below were made), but it seems that in ViewSets which define the queryset in the class itself, the query is evaluated when the thread begins?

I noticed in my logs that when I started the server, I got a whole bunch of log entries from the OrganizationAwareManager saying that _thread_locals had no associated attributes. The number of these log entries seemed to be about the same as the quantity of ViewSets in my project that used OrganizationAwareManager. They all got evaluated initially as the system initiated, with orgnization=None, so any further filtering on organization would be discarded.

ViewSets like the one below did not end up correctly filtering by organization:

class AssetTypeViewSet(viewsets.ModelViewSet):
queryset = AssetType.objects.all()
serializer_class = AssetTypeSerializer

When I modified to define the queryset inside get_queryset() so that it gets evaluated when the ViewSet is executed, things now work as expected:

class AssetTypeViewSet(viewsets.ModelViewSet):
queryset = AssetType.objects.none()
serializer_class = AssetTypeSerializer

def get_queryset(self):
return AssetType.objects.all()

Weird.

Thread locals in Python - negatives, with regards to scalability?

Threadlocals aren't the most robust or secure way to do things - check out this note, for instance. [ Though also see Glenn's comment, below ]

I suppose if you have coded cleanly, with the idea that you're putting stuff into a big global pot of info, accepting unguaranteed data consistency in those threaded locals and taking care to avoid race conditions, etc, etc, you might well be ok.

But, even with that in mind, there's still the 'magic'ness of threaded local vars, so documenting clearly what the heck is going on and any time a threadedlocal var is used might help you/future developers of the codebase down the line.

any harm in put django request object in thread local dict?

There shouldn't be any. Here you should find everything you need:

http://code.djangoproject.com/wiki/CookBookThreadlocalsAndUser

Does Django use one thread to process several requests in WSGI or Gunicorn?

Consider here Gunicorn as a web server. It has master and workers processes. Master process selects free worker process for handling http request. There are 3 type of workers:

  • synchronous (sync)
  • synchronous with threads (worker shares memory for running threads)
  • asynchronous (async)

Each sync worker without threads handles single request at a time, consider we are having 2 requests they either would be handled by 2 different workers (separate and isolated from each other python processes) or same worker but sequentially or parallely by one worker if it has more then one thread
To run an app with a sync workers without threads run it as follows:

gunicorn --workers=3 main:app

To run an app with a sync workers and with a threads:

gunicorn --workers=3 --threads=2 --worker-class=gthread main:app 

In above example each of the worker could handle 2 requests at a time.

With async worker we would have request concurrency: each worker (python process) would process many requests at a time but by one process. To run gunicorn with async workers your should properly set worker class:

gunicorn --worker-class=gevent --worker-connections=1000 --workers=3 main:app

It could be fully quarantied that request don't get mixed if you choose sync workers without threads and it seems that async too, for sync workers with threads your should to implement thread synchronization where multiple threads could write simultaneously.

Is Django middleware thread safe?

Why not bind your variable to the request object, like so:

class ThreadsafeTestMiddleware(object):

def process_request(self, request):
request.thread_safe_variable = some_dynamic_value_from_request

def process_response(self, request, response):
#... do something with request.thread_safe_variable here ...


Related Topics



Leave a reply



Submit