C++ Memory Model and Race Conditions on Char Arrays

C++ memory model - does this example contain a data race?

// start with x==0 and y==0
if (x) y = 1; // thread 1
if (y) x = 1; // thread 2

Since neither x nor y is true, the other won't be set to true either. No matter the order the instructions are executed, the (correct) result is always x remains 0, y remains 0.

Race condition when accessing adjacent members in a shared struct, according to CERT coding rule POS49-C?

The C11 memory model guarantees that accesses to distinct structure members (which aren't part of a bit-field) are independent, so you'll run into no problems modifying the two flags from different threads (i.e., the "load 8 bytes, modify 4, and write back 8" scenario is not allowed).

This guarantee does not extend in general to bitfields, so you have to be careful there.

Of course, if you are concurrently modifying the same flag from more than one thread, you'll likely trigger the prohibition against data races, so don't do that.

printf preventing race conditions

Section 1.4 of the latest OpenMP standard specifies what is the result of a race condition (emphasis mine):

If multiple threads write without synchronization to the same memory
unit, including cases due to atomicity considerations as described
above, then a data race occurs. Similarly, if at least one thread
reads from a memory unit and at least one thread writes without
synchronization to that same memory unit, including cases due to
atomicity considerations as described above, then a data race occurs.
If a data race occurs then the result of the program is unspecified.

What you notice is completely consistent with the statement in bold. In fact, as the behavior in a program containing a data-race is unspecified, it makes little sense to argue why a particular output results from a given run. In particular, it is only by chance that you obtained 720 when inserting a printf before the ans+=ans command, and there's no guarantee that you will always encounter the same behavior.

How can I make code that concurrently reads and modifies an array well-defined without introducing locking?

This is what the _Atomic type qualifier is for in C11. You would declare your array as

_Atomic unsigned char a[n];

which means that each element of the array can be read or written atomically.

Prior to C11, there's no standard way to do this, but generally, depending on the implementation, certain datatypes will be atomic for reads and writes. To know which those are, you'll have to look at the documentation for the implementation you are using.

Note that the default memory ordering for C11 _Atomic accesses is memory_order_seq_cst (sequential consistency), and if you don't need that, you can use atomic_load_explicit and atomic_store_explicit actions with a weaker memory ordering (ie memory_order_relaxed in your example)

Can two threads write to different element of the same array?

By definition, a race condition happens when 1 or more threads write data to the same location in memory while others read from it (or write to it, too). Would multiple threads each modifying a different array element be writing to the same location in memory? The answer is no. Each array element has a region of memory reserved for it alone within the region attributed the overall array. Modifications of different elements therefore do not write to any of the same memory locations.

Actually I asked this question a very long time ago here, and based part of my PhD work on that. I fitted hundreds of curves (least-squares fitting) in parallel, while updating a single array that has the results by multiple threads.

Race Conditions in C

As far as count is concerned, there is no race: each of the two processes has its own separate count.

As to the order in which the characters of "Output 1" and "Output 2" appear on stdout, there is indeed a race: the two outputs can end up arbitrarily interleaved.