Are C++11 Thread_Local Variables Automatically Static

Are C++11 thread_local variables automatically static?

According to the C++ Standard

When thread_local is applied to a variable of block scope the
storage-class-specifier static is implied if it does not appear
explicitly

So it means that this definition

void f() {
    thread_local vector<int> V;
    V.clear();
    ... // use V as a temporary variable
}

is equivalent to

void f() {
    static thread_local vector<int> V;
    V.clear();
    ... // use V as a temporary variable
}

However, a static variable is not the same as a thread_local variable.

1 All variables declared with the thread_local keyword have thread
storage duration. The storage for these entities shall last for the
duration of the thread in which they are created. There is a distinct
object or reference per thread, and use of the declared name refers to
the entity associated with the current thread

To distinguish these variables the standard introduces a new term thread storage duration along with static storage duration.

Are static variables automatically thread local?

There was static for block-scope variables before the C language specification acknowledged threads or had any support for them, much less _Thread_local specifically. In that context, when not combined with _Thread_local, it specifies static storage duration, meaning that the variable comes into existence (as if) at the beginning of program execution and exists and maintains its last-stored value for the entire run of the program. An object with static storage duration is shared by all threads.

On the other hand, _Thread_local always specifies thread storage duration, which means that the object so declared exists and maintains its last-stored value for the entire lifetime of a thread, and that the declared identifier designates a different object in each thread. When an object is declared _Thread_local at block scope, it must also bear either the extern or static qualifier, which conveys its linkage -- external or none.

extern declarations of any kind at block scope are unusual, but they do occasionally serve a useful purpose. Most of the time, though, static _Thread_local is what you will want for thread-local, block-scope variables.

Why does C++11 allow you to declare a local variable as thread_local?

According to the standard, a thread_local variable at block scope is also implicitly static. However, not all static variables are thread_local.

 int main()
 {
       thread_local int x;
 }

is actually equivalent to

 int main()
 {
       thread_local static int x;
 }

but different from;

 int main()
 {
       int x;    //  auto implied
 }

Is there any benefit of using static for thread_local variable?

The answer you cite is about C++, and in C++ it appears that the two declarations are identical. But that's not true in C, and since your question is tagged with both C and C++ tags, it is not clear which language you care about.

In C, if you declare a thread-local variable inside a function, you must declare it either static or extern (depending on which linkage it has). See §6.7.1, paragraph 3:

In the declaration of an object with block scope, if the declaration specifiers include _Thread_local, they shall also include either static or extern. If _Thread_local appears in any declaration of an object, it shall be present in every declaration of that object.

So that's an advantage of declaring a variable static thread_local: it allows C compilation, provided you include the threads.h library header.

However, it does not affect performance in any way in either language.

thread_local static variables in a dynamic loaded library – when are they created?

What cppreference says is paraphrased. What's actually in the standard is

All variables declared with the thread_local keyword have thread storage duration. The storage for these
entities lasts for the duration of the thread in which they are created. There is a distinct object or reference
per thread, and use of the declared name refers to the entity associated with the current thread.

There's nothing in there about when, exactly, the storage is allocated, just that it lasts for the duration of the thread. This means it could be allocated when the thread is created, or when the variable is first used, or possibly a combination of both.

The variable may not be constructed (I assume this is what you mean when you say "create an instance") when the storage is allocated. That depends on where and how the variable is defined. But, if it is constructed, it won't be destroyed until the thread ends.

Support for dynamically loading libraries via dlopen or LoadLibrary is a compiler/platform extension, and not part of the language. How that interacts with thread_local would also be platform specific.

What does the thread_local mean in C++11?

Thread-local storage duration is a term used to refer to data that is seemingly global or static storage duration (from the viewpoint of the functions using it) but, in actual fact, there is one copy per thread.

It adds to the current options:

automatic (exists during a block or function);
static (exists for the program duration); and
dynamic (exists on the heap between allocation and deallocation).

Something that is thread-local is brought into existence at thread creation time and disposed of when the thread finishes.

For example, think of a random number generator where the seed must be maintained on a per-thread basis. Using a thread-local seed means that each thread gets its own random number sequence, independent of all other threads.

If your seed was a local variable within the random function, it would be initialised every time you called it, giving you the same number each time. If it was a global, threads would interfere with each other's sequences.

Another example is something like strtok where the tokenisation state is stored on a thread-specific basis. That way, a single thread can be sure that other threads won't screw up its tokenisation efforts, while still being able to maintain state over multiple calls to strtok - this basically renders strtok_r (the thread-safe version) redundant.

Yet another example would be something like errno. You don't want separate threads modifying errno after one of your calls fails, but before you've had a chance to check the result.

This site has a reasonable description of the different storage duration specifiers.

Is it legal to return a thread_local reference from a function?

This depends on how you are using it. thread_local storage duration lasts for the life time of the thread that created the variable. So, if this function was called by a thread then x would live until that thread exited. If you ran from_thread_local in it's own thread then it would not be safe as you wouldn't get the value until after the thread ends and the lifetime of x has ended.

The static on the function level has no effect on this since in that context static describes the linkage of the function, not the duration when it is a free function. If it is a member function then it just describes that no instance of the class is needed to call the function.

static variable thread_local with open_MP

If you write an initialisation

thread_local std::vector<double> A::v {1,2,3};

you will get a copy of v containing {1,2,3} in all threads. But if you write an assignment

A::v = {1,2,3};

you will not. A::v will be initialised afresh for each thread and not copied from any other thread.

If you need a thread local copy of an array initialised to some set of values, you will have to make sure the action (either initialisation or assignment) that puts the values there is performed in each thread.

Can you use thread local variables inside a class or structure

In C and C++, thread-local storage applies to static variables or to variables with external linkage only.

Local (automatic) variables are usually created on the stack and therefore are specific to the thread that executes the code, but global and static variables are shared among all threads since they reside in the data or BSS segment. TLS provides a mechanism to make those global variables local to the thread and that's what the __thread keyword achieves - it instructs the compiler to create a separate copy of the variable in each thread while lexically it remains a global or static one (e.g., it can be accessed by different functions called within the same thread of execution).

Non-static class members and structure members are placed where the object (class or structure) is allocated - either on the stack if an automatic variable is declared or on the heap if new or malloc() is used. Either way, each thread receives a unique storage location for the variable and __thread is just not applicable in this case, hence the compiler error you get.

Are C++11 Thread_Local Variables Automatically Static