Does Std::Atomic<Std::String> Work Appropriately

Does std::atomicstd::string work appropriately?

The standard does not specify a specialization of std::atomic<std::string>, so the generic template <typename T> std::atomic<T> applies. 29.5 [atomics.types.generic] p1 states:

There is a generic class template atomic. The type of the template argument T shall be trivially copyable (3.9).

There is no statement that the implementation must diagnose violations of this requirement. So either (a) your use of std::atomic<std::string> invokes undefined behavior, or (b) your implementation provides std::atomic<std::string> as a conforming extension.

Looking at the MSDN page for std::atomic<T> (http://msdn.microsoft.com/en-us/library/vstudio/hh874651.aspx), it does explicitly mention the requirement that T be trivially copyable, and it does NOT say anything specific about std::atomic<std::string>. If it is an extension, it's undocumented. My money is on undefined behavior.

Specifically, 17.6.4.8/1 applies (with thanks to Daniel Krügler for setting me straight):

In certain cases (replacement functions, handler functions, operations on types used to instantiate standard library template components), the C++ standard library depends on components supplied by a C++ program. If these components do not meet their requirements, the Standard places no requirements on the implementation.

std::string certainly does not meet the std::atomic<T> requirement that the template parameter T be trivially copyable, so the standard places no requirements on the implementation. As a quality of implementation issue, note that static_assert(std::is_trivially_copyable<T>::value, "std::atomic<T> requires T to be trivially copyable"); is an easy diagnostic to catch this violation.

2016-04-19 Update: I don't know when the change happened, but VS2015 Update 2 does now diagnose std::atomic<std::string>:

error C2338: atomic requires T to be trivially copyable.

Why does std::atomicstd::string give trivially copyable error?

std::string cannot be used with std::atomic as it is not TriviallyCopyable

See explanation here:
https://en.cppreference.com/w/cpp/atomic/atomic

The primary std::atomic template may be instantiated with any
TriviallyCopyable type T satisfying both CopyConstructible and
CopyAssignable. The program is ill-formed if any of following values
is false:

https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable

std::atomicstd::string: Access violation writing location 0xFEEEEFEEE in Visual Studio 2013 (VC12) - does not occur when using std::atomicint

std::atomic<std::string> is not legal in standard C++, because std::string is not trivially copyable. Unfortunately there is no requirement for a compiler to refuse this code, but there is also no requirement (and no likelihood) that it will work properly either.

See also: Does std::atomic<std::string> work appropriately?

Is std::atomicstd::optionalstd::chrono::time_pointstd::chrono::system_clock valid/safe?

It's simple to check:

static_assert(std::is_trivially_copyable_v<
    std::optional<std::chrono::time_point<std::chrono::system_clock>>>);

Before gcc 8.1 (and the associated libstdc++) the assert fails so it's not safe to use. After gcc 8.1 the assert passes so it's safe.

Before clang 7 (and the asosciated libc++) the assert fails so it's not safe to use. After clang 7 the assert passes and it's safe.

What's the closest thing to `std::atomicstd::vector`?

The goal of having "simplest, most fool-prof thing" and the goal to handle a complex data structure as atomic contradict each other.

One way of pursuing your goal is to have lock-based approach. The comments are suggesting that. Generally, lock-based approach has its own caveats (with deadlocks and starvations), but for your case it will work, though may be still sub-optimal.

What you need seems like a produce-consumer queue. Single producer, single consumer.

If you have the expectation of high performance, it should be lock-free, ring-buffer based.

boost::lockfree::spsc_queue is one possible implementation. There are other implementations.

Also you may want to avoid string allocation and have boost::lockfree::spsc_queue<char>, and delimit strings by \0.

If you want even faster that that, you may have own implementation, optimized for your scenario.

But if you say "One occasionally pushes strings into the vector", and occationally means infrequently, you may want lock-based queue instead.

Boost.Thread has Synchronized Queues -- EXPERIMENTAL. There are other implementations.

The advantage of that instead of using mutex / condition_variable directly is that you don't have to write your own synchronization, so it really meets "simplest, most fool-prof thing"

I actually implement hybrid approach in my program. Which turns to lock-based when it needs to wait, but otherwise is lock-free. I haven't seen a good open-source implementations of such, but I would like to see.

Does “M&M rule” applies to std::atomic data-member?

You got it partly backwards. The article does not suggest to make all atomic members mutable. Instead it says:

(1) For a member variable, mutable implies mutex (or equivalent): A
mutable member variable is presumed to be a mutable shared variable
and so must be synchronized internally—protected with a mutex, made
atomic, or similar.
(2) For a member variable, mutex (or similar synchronization type)
implies mutable: A member variable that is itself of a synchronization
type, such as a mutex or a condition variable, naturally wants to be
mutable, because you will want to use it in a non-const way (e.g.,
take a std::lock_guard) inside concurrent const member
functions.

(2) says that you want a mutex member mutable. Because typically you also want to lock the mutex in const methods. (2) does not mention atomic members.

(1) on the other hand says that if a member is mutable, then you need to take care of synchronization internally, be it via a mutex or by making the member an atomic. That is because of the bullets the article mentions before:

If you are implementing a type, unless you know objects of the type can never be shared (which is generally impossible), this means that each of your const member functions must be either:
truly physically/bitwise const with respect to this object, meaning that they perform no writes to the object’s data; or else
internally synchronized so that if it does perform any actual writes to the object’s data, that data is correctly protected with a mutex or equivalent (or if appropriate are atomic<>) so that any possible concurrent const accesses by multiple callers can’t tell the difference.

A member that is mutable is not "truly const", hence you need to take care of synchronization internally (either via a mutex or by making the member atomic).

TL;DR: The article does not suggest to make all atomic members mutable. It rather suggests to make mutex members mutable and to use internal synchronization for all mutable members.

How to use std::atomic efficiently

The reinterpret_cast will yield undefined behaviour. Your variable is either a std::atomic<uint8_t> or a plain uint8_t; you cannot cast between them. The size and alignment requirements may be different, for example. e.g. some platforms only provide atomic operations on words, so std::atomic<uint8_t> will use a full machine word where plain uint8_t can just use a byte. Non-atomic operations may also be optimized in all sorts of ways, including being significantly reordered with surrounding operations, and combined with other operations on adjacent memory locations where that can improve performance.
This does mean that if you want atomic operations on some data then you have to know that in advance, and create suitable std::atomic<> objects rather than just allocating a generic buffer. Of course, you could allocate a buffer and then use placement new to initialize your atomic variable in that buffer, but you'd have to ensure the size and alignment were correct, and you wouldn't be able to use non-atomic operations on that object.
If you really don't care about ordering constraints on your atomic object then use memory_order_relaxed on what would otherwise be the non-atomic operations. However, be aware that this is highly specialized, and requires great care. For example, writes to distinct variables may be read by other threads in a different order than they were written, and different threads may read the values in different orders to each other, even within the same execution of the program.
If CAS is slower for a byte than a word, you may be better off using std::atomic<unsigned>, but this will have a space penalty, and you certainly can't just use std::atomic<unsigned> to access a sequence of raw bytes --- all operations on that data must be through the same std::atomic<unsigned> object. You are generally better off writing code that does what you need and letting the compiler figure out the best way to do that.

For x86/x64, with a std::atomic<unsigned> variable a, a.load(std::memory_order_acquire) and a.store(new_value,std::memory_order_release) are no more expensive than loads and stores to non-atomic variables as far as the actual instructions go, but they do limit the compiler optimizations. If you use the default std::memory_order_seq_cst then one or both of these operations will incur the synchronization cost of a LOCKed instruction or a fence (my implementation puts the price on the store, but other implementations may choose differently). However, memory_order_seq_cst operations are easier to reason about due to the "single total ordering" constraint they impose.

In many cases it is just as fast, and a lot less error-prone, to use locks rather than atomic operations. If the overhead of a mutex lock is significant due to contention then you might need to rethink your data access patterns --- cache ping pong may well hit you with atomics anyway.

Does Std::Atomic<Std::String> Work Appropriately