Is there a way to determine thread local storage model used by a library on Linux
I ran into this error myself, and while investigating it, I came on a mailing list post with this info:
If you link a shared object containing IE-model access relocs, the object
will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen
might refuse to load it.
Looking at /usr/include/elf.h
, we have:
/* Values of `d_un.d_val' in the DT_FLAGS entry. */
...
#define DF_STATIC_TLS 0x00000010 /* Module uses the static TLS model */
So you need to test if DF_STATIC_TLS
is set in the DT_FLAGS
entry of the shared library.
To test things, I created a simple piece of code using thread local storage:
static __thread int foo;
void set_foo(int new) {
foo = new;
}
I then compiled it twice with the two different thread local storage models:
gcc -ftls-model=initial-exec -fPIC -c tls.c -o tls-initial-exec.o
gcc -shared tls-initial-exec.o -o tls-initial-exec.so
gcc -ftls-model=global-dynamic -fPIC -c tls.c -o tls-global-dynamic.o
gcc -shared tls-global-dynamic.o -o tls-global-dynamic.so
And sure enough, I can see a difference between the two libraries using readelf
:
$ readelf --dynamic tls-initial-exec.so
Dynamic section at offset 0xe00 contains 25 entries:
Tag Type Name/Value
...
0x000000000000001e (FLAGS) STATIC_TLS
The tls-global-dynamic.so
version did not have a DT_FLAGS
entry, presumably because it didn't have any flags set. So it should be fairly easy to create a script using readelf
and grep
to find affected libraries.
How does thread_local! work with dynamic libraries in rust?
The reason this behavior is observed is because the shared library contains it's own copy of the code of crates it depends on, resulting in two different thread local declarations.
The solution to this is to pass a reference to the thread local in question, instead of directly accessing the thread local. See here for more information on how to obtain a reference to a thread local: How to create a thread local variable inside of a Rust struct?
How fast is thread local variable access on Linux
How fast is accessing a thread local variables in Linux
It depends, on a lot of things.
Some processors (i*86
) have special segment (fs
, or gs
in x86_64
mode). Other processors do not (but usually they will have a register reserved for accessing current thread, and TLS
is easy to find using that dedicated register).
On i*86
, using fs
, the access is almost as fast as direct memory access.
I keep on reading horror stories about the slowness of thread local variable access
It would have helped if you provided links to some such horror stories. Without the links, it's impossible to tell whether their authors know what they are talking about.
LD_PRELOAD and thread local variable
This is not possible, since thread-local-storage requires per-thread initialisation.
LD_PRELOAD
will load the library even before the standard library is loaded, which messes up TLS initialisation.
Update:
Please read sections 2 and 3 of ELF Handling For Thread-Local Storage
How does the gcc `__thread` work?
Recent GCC, e.g. GCC 5 do support C11 and its thread_local
(if compiling with e.g. gcc -std=c11
). As FUZxxl commented, you could use (instead of C11 thread_local
) the __thread
qualifier supported by older GCC versions. Read about Thread Local Storage.
pthread_getspecific
is indeed quite slow (it is in the POSIX library, so is not provided by GCC but e.g. by GNU glibc or musl-libc) since it involves a function call. Using thread_local
variables will very probably be faster.
Look into the source code of MUSL's thread/pthread_getspecific.c
file
for an example of implementation. Read this answer to a related question.
And _thread
& thread_local
are (often) not magically translated to calls to pthread_getspecific
. They usually involve some specific address mode and/or register (details are implementation specific, related to the ABI; on Linux, I guess that since x86-64 has more registers & address modes, its implementation of TLS is faster than on i386), with help from the compiler, the linker and the runtime system. It could happen on the contrary that some implementations of pthread_getspecific
are using some internal thread_local
variables (in your implementation of POSIX threads).
As an example, compiling the following code
#include <pthread.h>
const extern pthread_key_t key;
__thread int data;
int
get_data (void) {
return data;
}
int
get_by_key (void) {
return *(int*) (pthread_getspecific (key));
}
using GCC 5.2 (on Debian/Sid) with gcc -m32 -S -O2 -fverbose-asm
gives the following code for get_data
using TLS:
.type get_data, @function
get_data:
.LFB3:
.cfi_startproc
movl %gs:data@ntpoff, %eax # data,
ret
.cfi_endproc
and the following code of get_by_key
with an explicit call to pthread_getspecific
:
get_by_key:
.LFB4:
.cfi_startproc
subl $24, %esp #,
.cfi_def_cfa_offset 28
pushl key # key
.cfi_def_cfa_offset 32
call pthread_getspecific #
movl (%eax), %eax # MEM[(int *)_4], MEM[(int *)_4]
addl $28, %esp #,
.cfi_def_cfa_offset 4
ret
.cfi_endproc
Hence using TLS with __thread
(or thread_local
in C11) should probably be faster than using pthread_getspecific
(avoiding the overhead of a call).
Notice that thread_local
is a convenience macro defined in <threads.h>
(a C11 standard header).
Related Topics
How to Detect If a Server Is Using Spdy
How to Provide Extend-On-Write Functionality for Memory Mapped Files in Linux
Tools Required to Learn Arm on Linux X86 Platform
Linux Kernel: Kernel Version String Appended with Either ''+" or "-Dirty"
How to Make Ssh Command Execution to Timeout
Using Su/Sudo When Accessing Remote Git Repositories Over Ssh
How Can Beaglebone Black Be Used as Mass Storage Device
Loop Through Array of Arrays of String with Spaces
Gnu Linker: Alternative to -Version-Script to List Exported Symbols at The Command Line
Capture Stdin/Stderr/Stdout of a Process After It's Been Started, Using Command Line
Why Does Pages Allocation with Order of 10 or 11 Using _Get_Free_Pages() Usually Fail
Running a Script Just Before Installation of Debian Finishes Using Preseed
Vagrant Ssh -C and Keeping a Background Process Running After Connection Closed
How to Route Tcp/Ip Responses Through a Different Interface
How to Access Raspberry Pi Qemu Vm via Network