Is Static Init Thread-Safe with Vc2010

Is static init thread-safe with VC2010?

From Visual Studio 2010's documentation on Static:

Assigning a value to a static local variable in a multithreaded application is not thread safe and we do not recommend it as a programming practice.

The second part of your question has some good existing answers.

Updated Nov 22, 2015:

Others have verified, specifically, that static initialization is not thread safe either (see comment and other answer).

User squelart on VS2015:

you may want to add that VS2015 finally gets it right: https://msdn.microsoft.com/en-au/library/hh567368.aspx#concurrencytable ("Magic statics")

Thread-safe initialisation of local statics: MSVC

The C++0x Standard says:

§6.7 Declaration statement [stmt.dcl]

4/ The zero-initialization (8.5) of all block-scope variables with static storage duration (3.7.1) or thread storage duration (3.7.2) is performed before any other initialization takes place. Constant initialization (3.6.2) of a block-scope entity with static storage duration, if applicable, is performed before its block is first entered.
An implementation is permitted to perform early initialization of other block-scope variables with static or thread storage duration under the same conditions that an implementation is permitted to statically initialize a variable with static or thread storage duration in namespace scope (3.6.2). Otherwise such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization.

If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration.

If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.⁸⁸

If control re-enters the declaration recursively while the variable is being initialized, the behavior is undefined.

[ Example:

int foo(int i) {
  static int s = foo(2*i); // recursive call - undefined
  return i+1;
}

—end example ]

88) The implementation must not introduce any deadlock around execution of the initializer.

As expected, it is quite complete.

However the fact is that even older versions of gcc already complied with this, and in fact do even better: in case of recursive initialization, an exception is thrown.

Finally, regarding a programmer adding it afterward: you can normally do it if you have something like Compare And Swap available, and use a sufficiently small variable, relying on zero-initialization of the variable to mark its non-computed state. However I do agree it's much easier if it's baked in.

I am afraid I stopped followed VC++ progresses though, so I don't know where it stands now. My only advice would be... look it up at assembly level.

Cost of thread-safe local static variable initialization in C++11?

A look at the generated assembler code helps.

Source

#include <vector>

std::vector<int> &get(){
  static std::vector<int> v;
  return v;
}
int main(){
  return get().size();
}

Assembler

std::vector<int, std::allocator<int> >::~vector():
        movq    (%rdi), %rdi
        testq   %rdi, %rdi
        je      .L1
        jmp     operator delete(void*)
.L1:
        rep ret
get():
        movzbl  guard variable for get()::v(%rip), %eax
        testb   %al, %al
        je      .L15
        movl    get()::v, %eax
        ret
.L15:
        subq    $8, %rsp
        movl    guard variable for get()::v, %edi
        call    __cxa_guard_acquire
        testl   %eax, %eax
        je      .L6
        movl    guard variable for get()::v, %edi
        movq    $0, get()::v(%rip)
        movq    $0, get()::v+8(%rip)
        movq    $0, get()::v+16(%rip)
        call    __cxa_guard_release
        movl    $__dso_handle, %edx
        movl    get()::v, %esi
        movl    std::vector<int, std::allocator<int> >::~vector(), %edi
        call    __cxa_atexit
.L6:
        movl    get()::v, %eax
        addq    $8, %rsp
        ret
main:
        subq    $8, %rsp
        call    get()
        movq    8(%rax), %rdx
        subq    (%rax), %rdx
        addq    $8, %rsp
        movq    %rdx, %rax
        sarq    $2, %rax
        ret

Compared to

Source

#include <vector>

static std::vector<int> v;
std::vector<int> &get(){
  return v;
}
int main(){
  return get().size();
}

Assembler

std::vector<int, std::allocator<int> >::~vector():
        movq    (%rdi), %rdi
        testq   %rdi, %rdi
        je      .L1
        jmp     operator delete(void*)
.L1:
        rep ret
get():
        movl    v, %eax
        ret
main:
        movq    v+8(%rip), %rax
        subq    v(%rip), %rax
        sarq    $2, %rax
        ret
        movl    $__dso_handle, %edx
        movl    v, %esi
        movl    std::vector<int, std::allocator<int> >::~vector(), %edi
        movq    $0, v(%rip)
        movq    $0, v+8(%rip)
        movq    $0, v+16(%rip)
        jmp     __cxa_atexit

I'm not that great with assembler, but I can see that in the first version v has a lock around it and get is not inlined whereas in the second version get is essentially gone.

You can play around with various compilers and optimization flags, but it seems no compiler is able to inline or optimize out the locks, even though the program is obviously single threaded.

You can add static to get which makes gcc inline get while preserving the lock.

To know how much these locks and additional instructions cost for your compiler, flags, platform and surrounding code you would need to make a proper benchmark.

I would expect the locks to have some overhead and be significantly slower than the inlined code, which becomes insignificant when you actually do work with the vector, but you can never be sure without measuring.

Double-checked locking and unique_ptr static initialization in Visual C++

Don't optimize prematurely

Unfortunately, MSVC 2010 does does not support 'magic statics' that, in effect, perform automatic double-checked locking. But before you start optimizing here: Do you REALLY need it? Don't complicate your code unless it's really necessary. Especially, since you have MSVC 2010 which does not fully support C++11 you don't have any portable standard way that guarantees proper multi-threading.

The way to get it to work

However, you can use boost::atomic<Foo*> to deal with the problem and the compiler will most likely handle the problem correctly. If you really want to be sure, check the compiled assembly code in both debug and release mode. The assignment to an atomic pointer is guaranteed to take place after the construction, even if code is inlined. This is due to special compiler intrinsics for atomic operations which are guaranteed not be be reordered with writes that are supposed to happen before the write to the atomic variable.

The following code should do the trick:

Foo & Foo::Instance()
{
    static boost::atomic<Foo *> instance; // zero-initialized, since static

    if ( !instance.load() )
    {
        boost::lock_guard<boost::mutex> lock(mutex);
        if ( !instance.load() )
        {
            // this code is guaranteed to be called at most once.
            instance = new Foo;
            std::atexit( []{ delete &Instance(); } );
        }
    }
    return *instance.load();
}

Problem 1

Your compiler might still reorder things in some optimization pass. If the compiler doesn't, then the processor might do some construction reordering. Unless you use genuine atomics with their special instructions or thread-safe constructs like mutexes and condition variables you will get races, if you access a variable through different threads at the same time and at least one of them is writing. Never EVER do that. Again, boost::atomic will do the job (most likely).

Problem 2

That is exactly what magic statics are supposed to do: They safely initialize static variables that are accessed concurrently. MSVC 2010 does not support this. Therefore, don't use it. The code that is produced by the compiler will be unsafe. What you suspected in your question can in theory really happen. By the way: The memory for static variables is reserved at program start-up and is AFAIK zero-initialized. No new operator is called to reserve the memory for the static variable.

Still a problem?

The std::atexit() function might not be thread-safely implemented in MSVC 2010 and should possibly not be used at all, or should only be used in the main() thread. Most implementations of double-checked locking ignore this clean-up problem. And it is no problem as long as the destructor of Foo does nothing important. The unfreed memory, file handles and so forth will be reclaimed by the operating system anyways. Examples of something important that could be done in the destructor are notifying another process about something or serializing state to disk that will be loaded at the next start of the application. If you are interested in double checked locking there's a really good talk Lock-Free Programming (or, Juggling Razor Blades) by Herb Sutter that covers this topic.

How can I create a thread-safe singleton pattern in Windows?

If you are are using Visual C++ 2005/2008 you can use the double checked locking pattern, since "volatile variables behave as fences". This is the most efficient way to implement a lazy-initialized singleton.

From MSDN Magazine:

Singleton* GetSingleton()
{
    volatile static Singleton* pSingleton = 0;

    if (pSingleton == NULL)
    {
        EnterCriticalSection(&cs);

        if (pSingleton == NULL)
        {
            try
            {
                pSingleton = new Singleton();
            }
            catch (...)
            {
                // Something went wrong.
            }
        }

        LeaveCriticalSection(&cs);
    }

    return const_cast<Singleton*>(pSingleton);
}

Whenever you need access to the singleton, just call GetSingleton(). The first time it is called, the static pointer will be initialized. After it's initialized, the NULL check will prevent locking for just reading the pointer.

DO NOT use this on just any compiler, as it's not portable. The standard makes no guarantees on how this will work. Visual C++ 2005 explicitly adds to the semantics of volatile to make this possible.

You'll have to declare and initialize the CRITICAL SECTION elsewhere in code. But that initialization is cheap, so lazy initialization is usually not important.

Thread safe lazy construction of a singleton in C++

Basically, you're asking for synchronized creation of a singleton, without using any synchronization (previously-constructed variables). In general, no, this is not possible. You need something available for synchronization.

As for your other question, yes, static variables which can be statically initialized (i.e. no runtime code necessary) are guaranteed to be initialized before other code is executed. This makes it possible to use a statically-initialized mutex to synchronize creation of the singleton.

From the 2003 revision of the C++ standard:

Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. Zero-initialization and initialization with a constant expression are collectively called static initialization; all other initialization is dynamic initialization. Objects of POD types (3.9) with static storage duration initialized with constant expressions (5.19) shall be initialized before any dynamic initialization takes place. Objects with static storage duration defined in namespace scope in the same translation unit and dynamically initialized shall be initialized in the order in which their definition appears in the translation unit.

If you know that you will be using this singleton during the initialization of other static objects, I think you'll find that synchronization is a non-issue. To the best of my knowledge, all major compilers initialize static objects in a single thread, so thread-safety during static initialization. You can declare your singleton pointer to be NULL, and then check to see if it's been initialized before you use it.

However, this assumes that you know that you'll use this singleton during static initialization. This is also not guaranteed by the standard, so if you want to be completely safe, use a statically-initialized mutex.

Edit: Chris's suggestion to use an atomic compare-and-swap would certainly work. If portability is not an issue (and creating additional temporary singletons is not a problem), then it is a slightly lower overhead solution.

Are pre-main global initializers guaranteed to run single-threaded?

Initialization of global variables is guaranteed single-threaded as long as the program doesn't itself start a thread (e.g. in a constructor of some global variable); once that happens, the implementation is then allowed to parallellize remaining initializations, to some extent.

[basic.start.init]/2 ...Variables with ordered initialization defined within a single translation unit shall be initialized in the order of their definitions in the translation unit. If a program starts a thread (30.3), the subsequent initialization of a variable is unsequenced with respect to the initialization of a variable defined in a different translation unit. Otherwise, the initialization of a variable is indeterminately sequenced with respect to the initialization of a variable defined in a different translation unit. If a program starts a thread, the subsequent unordered initialization of a variable is unsequenced with respect to every other dynamic initialization. Otherwise, the unordered initialization of a variable is indeterminately sequenced with respect to every other dynamic initialization. [ Note: This definition permits initialization of a sequence of ordered variables concurrently
with another sequence. —end note ]

"Indeterminately sequenced" is the part that guarantees single-threaded execution. By definition, the notion of sequenced {before, after, indeterminately} is only meaningful within a single thread:

[intro.execution]/13 Sequenced before is a ... relation between evaluations executed by a single thread...

Is the default constructor thread-safe in C++?

log_String() is basically a function, but it is also a constructor. So in effect its call during object creation means also calling constructors of all member variables (which have constructors), as well as constructors of all the base classes, and constructors of their member variables, recursively.

So you need to consider all the functions which get called. The two member variables, list and m, should have thread safe constructors, since they are from the standard library, and while I didn't check from the standard (draft should be freely downloadable, if you want to check yourself), things would be just crazy if they didn't have thread-safe constructors. Then there is no base class, and no code in your constructor.

Conclusion, it is thread-safe, because there's nothing in there, which would cause problems "even if many threads call simultaneously log_String()". No shared data or other shared resources visible, and if there are any shared data hidden in the member variables, they can be trusted to be done safely.

Writing thread-unsafe public constructors could be considered stupid, even evil. Still, if you had member variables or base class from 3rd party libraries, or just of your own types, and you aren't 100% sure of their quality, then it's worth it to stop and think if this kind of stupidity has been done.

An example code which one might plausibly write, especially for debugging purposes, and which would make things thread-unsafe:

private:
    static unsigned static_counter;

public:
    log_String() {
        ++static_counter; // not atomic operation! potential data race!
    };

For completeness: Fix for the above code would be to simply use std::atomic<unsigned> for the counter. More complex cases might require static mutexes (beware if you are using old crappy compilers (at least MSVC2010) which might have unavoidable race conditions with static data).

C++ Thread Safety Summary

The current standard doesn't mention threading at all, in any respect. In practice, the standard containers provide thread-safe reading, but require synchronization for writing.

C++ 0x doesn't talk much (at all?) specifically about containers with respect to thread safety/sharing, but does talk about assignments and such. In the end, it comes out pretty much the same though -- even though the object is in a container, you're reading/writing data, and you have to synchronize when/if at least one thread may modify the data.

POD data doesn't really change much: modifications will require synchronization as a general rule. There's usually some subset of data types for which operations are normally atomic, but the members of that subset vary by platform. It'll typically include types up to the native word size of the hardware allocated with "natural" alignment; anything else is open to a lot more question.

Is Static Init Thread-Safe with Vc2010