Why Are Redundant Scope Qualifications Supported by the Compiler, and Is It Legal

Why are redundant scope qualifications supported by the compiler, and is it legal?

Yes, it's allowed (§9/2):

The class-name is also inserted into the scope of the class itself; this is known as the injected-class-name. For purposes of access checking, the injected-class-name is treated as if it were a public member name.

For information about the reasoning that lead to class name inject, you might want to read N0444.

Why are redundant class name qualifiers allowed?

While the phenomenon can probably be attributed to class name injection, as noted in ephemient's answer, for this specific example it has been outlawed by C++ language quite a while ago.

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#147

The combination A::A is required to refer to class constructor, not to the class injected name. The A::A(i) is supposed to be interpreted by a compliant compiler as an illegal (and therefore meaningless) expression involving constructor name. Comeau compiler, for one example, will refuse to compile your code for that reason.

Apparently VC11 continues to treat A::A as a reference to the injected class name. Interestingly enough, I don't observe this problem in VS2005.

Back in the day when A::A was interpreted as referring to the injected name, one could declare an A object as

A::A::A::A::A::A a;

and so on, with arbitrary number of As. But not anymore. Surprisingly, version of GCC (4.3.4?) used by ideone still suffers from this issue

http://ideone.com/OkR0F

You can try this with your version of VC11 and see if it allows that.

Why is decltype(class::class::class::member) valid

This works because of the injected-class-name:

(N3337) [class]/2:A class-name is inserted into the scope in which it is declared immediately after the class-name is seen.
The class-name is also inserted into the scope of the class itself; this is known as the injected-class-name.
For purposes of access checking, the injected-class-name is treated as if it were a public member name. [...]

So you can arbitrarily nest these, and they'll work with derived types as well:

struct A { using type = int; };
struct B : public A {};

using foo = B::B::B::A::A::A::type;

Note that in the case of A[::A]*::A, the injected-class-name can be considered to name the constructor instead:

[class.qual]/2: In a lookup in which the constructor is an acceptable lookup result and the nested-name-specifier nominates
a class C:

— if the name specified after the nested-name-specifier, when looked up in C, is the injected-class-name
of C (Clause 9), or

— [...]

the name is instead considered to name the constructor of class C.

extra qualification for class member

The feature is that the name of the type is injected inside the class scope, that is, there is an implicit typedef Foo Foo; of sorts inside the class Foo.

The feature is in the language because in several constructs the nested type is required. For example, when disabling dynamic dispatch by explicitly naming the level of the hierarchy where the overrider is to be selected (obj.Base::f()).

The original list had some 10 odd constructs for which the name had to be present, and it was simplified by making the nested name available in all contexts, which in turn means that it allows for the funny syntax you wrote.

extra qualification for class member

The feature is that the name of the type is injected inside the class scope, that is, there is an implicit typedef Foo Foo; of sorts inside the class Foo.

Meaning of a structname inside namespace of struct?

It's the same as:

Str1 valS;

the extra struct Str1:: is redundant. See N0444

Scope resolution operator

This code is not valid.

It was a bug in g++ that it accepted the code. See "g++ does not treat injected class name correctly." The bug was resolved as fixed in 2009, so it should be fixed in any recent version of g++.

Why don't compilers merge redundant std::atomic writes?

The C++11 / C++14 standards as written do allow the three stores to be folded/coalesced into one store of the final value. Even in a case like this:

  y.store(1, order);
  y.store(2, order);
  y.store(3, order); // inlining + constant-folding could produce this in real code

The standard does not guarantee that an observer spinning on y (with an atomic load or CAS) will ever see y == 2. A program that depended on this would have a data race bug, but only the garden-variety bug kind of race, not the C++ Undefined Behaviour kind of data race. (It's UB only with non-atomic variables). A program that expects to sometimes see it is not necessarily even buggy. (See below re: progress bars.)

Any ordering that's possible on the C++ abstract machine can be picked (at compile time) as the ordering that will always happen. This is the as-if rule in action. In this case, it's as if all three stores happened back-to-back in the global order, with no loads or stores from other threads happening between the y=1 and y=3.

It doesn't depend on the target architecture or hardware; just like compile-time reordering of relaxed atomic operations are allowed even when targeting strongly-ordered x86. The compiler doesn't have to preserve anything you might expect from thinking about the hardware you're compiling for, so you need barriers. The barriers may compile into zero asm instructions.

So why don't compilers do this optimization?

It's a quality-of-implementation issue, and can change observed performance / behaviour on real hardware.

The most obvious case where it's a problem is a progress bar. Sinking the stores out of a loop (that contains no other atomic operations) and folding them all into one would result in a progress bar staying at 0 and then going to 100% right at the end.

There's no C++11 std::atomic way to stop them from doing it in cases where you don't want it, so for now compilers simply choose never to coalesce multiple atomic operations into one. (Coalescing them all into one operation doesn't change their order relative to each other.)

Compiler-writers have correctly noticed that programmers expect that an atomic store will actually happen to memory every time the source does y.store(). (See most of the other answers to this question, which claim the stores are required to happen separately because of possible readers waiting to see an intermediate value.) i.e. It violates the principle of least surprise.

However, there are cases where it would be very helpful, for example avoiding useless shared_ptr ref count inc/dec in a loop.

Obviously any reordering or coalescing can't violate any other ordering rules. For example, num++; num--; would still have to be full barrier to runtime and compile-time reordering, even if it no longer touched the memory at num.

Discussion is under way to extend the std::atomic API to give programmers control of such optimizations, at which point compilers will be able to optimize when useful, which can happen even in carefully-written code that isn't intentionally inefficient. Some examples of useful cases for optimization are mentioned in the following working-group discussion / proposal links:

http://wg21.link/n4455: N4455 No Sane Compiler Would Optimize Atomics
http://wg21.link/p0062: WG21/P0062R1: When should compilers optimize atomics?

See also discussion about this same topic on Richard Hodges' answer to Can num++ be atomic for 'int num'? (see the comments). See also the last section of my answer to the same question, where I argue in more detail that this optimization is allowed. (Leaving it short here, because those C++ working-group links already acknowledge that the current standard as written does allow it, and that current compilers just don't optimize on purpose.)

Within the current standard, volatile atomic<int> y would be one way to ensure that stores to it are not allowed to be optimized away. (As Herb Sutter points out in an SO answer, volatile and atomic already share some requirements, but they are different). See also std::memory_order's relationship with volatile on cppreference.

Accesses to volatile objects are not allowed to be optimized away (because they could be memory-mapped IO registers, for example).

Using volatile atomic<T> mostly fixes the progress-bar problem, but it's kind of ugly and might look silly in a few years if/when C++ decides on different syntax for controlling optimization so compilers can start doing it in practice.

I think we can be confident that compilers won't start doing this optimization until there's a way to control it. Hopefully it will be some kind of opt-in (like a memory_order_release_coalesce) that doesn't change the behaviour of existing code C++11/14 code when compiled as C++whatever. But it could be like the proposal in wg21/p0062: tag don't-optimize cases with [[brittle_atomic]].

wg21/p0062 warns that even volatile atomic doesn't solve everything, and discourages its use for this purpose. It gives this example:

if(x) {
    foo();
    y.store(0);
} else {
    bar();
    y.store(0);  // release a lock before a long-running loop
    for() {...} // loop contains no atomics or volatiles
}
// A compiler can merge the stores into a y.store(0) here.

Even with volatile atomic<int> y, a compiler is allowed to sink the y.store() out of the if/else and just do it once, because it's still doing exactly 1 store with the same value. (Which would be after the long loop in the else branch). Especially if the store is only relaxed or release instead of seq_cst.

volatile does stop the coalescing discussed in the question, but this points out that other optimizations on atomic<> can also be problematic for real performance.

Other reasons for not optimizing include: nobody's written the complicated code that would allow the compiler to do these optimizations safely (without ever getting it wrong). This is not sufficient, because N4455 says LLVM already implements or could easily implement several of the optimizations it mentioned.

The confusing-for-programmers reason is certainly plausible, though. Lock-free code is hard enough to write correctly in the first place.

Don't be casual in your use of atomic weapons: they aren't cheap and don't optimize much (currently not at all). It's not always easy easy to avoid redundant atomic operations with std::shared_ptr<T>, though, since there's no non-atomic version of it (although one of the answers here gives an easy way to define a shared_ptr_unsynchronized<T> for gcc).

Why Are Redundant Scope Qualifications Supported by the Compiler, and Is It Legal