Can a C Compiler Rearrange Stack Variables

Can a C compiler rearrange stack variables?

As there is nothing in the standard prohibiting that for C or C++ compilers, yes, the compiler can do that.

It is different for aggregates (i.e. structs), where the relative order must be maintained, but still the compiler may insert pad bytes to achieve preferable alignment.

IIRC newer MSVC compilers use that freedom in their fight against buffer overflows of locals.

As a side note, in C++, the order of destruction must be reverse order of declaration, even if the compiler reorders the memory layout.

(I can't quote chapter and verse, though, this is from memory.)

How to insist a C compiler put local variables on the stack, not in registers

I suppose that you use a kind of "mark and sweep" GC. In such case you only need to save registers at the moment when marking phase starts. My advise is to examine your GC, find the place where the "mark and sweep" operation starts and to put a code placing all registers into an accessible memory here. setjmp is a semi-portable way to achieve this (unless you are working on sparc).

(A + B + C) ≠ (A + C + B​) and compiler reordering

If the optimiser does such a reordering it is still bound to the C specification, so such a reordering would become:

uint64_t u64_z = (uint64_t)u32_x + (uint64_t)u32_y + u64_a;

Rationale:

We start with

uint64_t u64_z = u32_x + u64_a + u32_y;

Addition is performed left-to-right.

The integer promotion rules state that in the first addition in the original expression, u32_x be promoted to uint64_t. In the second addition, u32_y will also be promoted to uint64_t.

So, in order to be compliant with the C specification, any optimiser must promote u32_x and u32_y to 64 bit unsigned values. This is equivalent to adding a cast. (The actual optimising is not done at the C level, but I use C notation because that is a notation that we understand.)

What is the problem of the following code in c

The a[-1] is undefined memory space.

Can a C++ compiler re-order elements in a struct

It normally can't reorder elements, no.

An exception is if there's an access specifier separating them:

struct Foo {    
A a;
B b;
C c;
private:
D d;
E e;
F f;
};

a, b and c are guaranteed to be stored in this order, and d, e and f are guaranteed to be stored in order. But there is no guarantees about where a, b and c are stored relative to d, e and f.

Another thing to keep in mind is that the compiler can insert as much padding as it likes, even if it doesn't reorder anything.

Here's the relevant part of the standard:

Section 9.2.12:

Nonstatic data members of a
(non-union) class declared without an
intervening access-specifier are
allocated so that later members have
higher addresses within a class
object. The order of allocation of
nonstatic data members separated by an
access-specifier is unspecified
(11.1)"

Does as-if rule prevent compiler reordering of accesses to global/member variables?

Absolutely they may. The compiler has no obligation whatsoever to consider side effects to other threads or hardware.

The compiler is only forced to consider this if you use volatile or synchronization (and those two are not interchangable).

The Standard memory model is known as SC-DRF, or Sequentially Consistent Data Race Free. A data race would be exactly the scenario you've just described- where one thread is observing non-synchronized variables whilst another is mutating them. This is undefined behaviour. Effectively, the Standard explicitly gives the compiler the freedom to assume that there are no other threads or hardware which is reading non-volatile non-synchronized variables. It is absolutely legitimate for a compiler to optimize on that basis.

By the way, that link is kinda crap. His "fix" does not fix anything at all. The only correct fix for multi-threaded code is to use synchronization.

Is a C compiler allowed to coalesce sequential assignments to volatile variables?

The behavior of volatile seems to be up to the implementation, partly because of a curious sentence which says: "What constitutes an access to an object that has volatile-qualified type is implementation-defined".

In ISO C 99, section 5.1.2.3, there is also:

3 In the abstract machine, all expressions are evaluated as specified by the semantics. An
actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced (including any caused by
calling a function or accessing a volatile object).

So although requirements are given that a volatile object must be treated in accordance with the abstract semantics (i.e not optimized), curiously, the abstract semantics itself allows for the elimination of dead code and data flows, which are examples of optimizations!

I'm afraid that to know what volatile will and will not do, you have to go by your compiler's documentation.

How compilers rearranging variables optimizes the code?

I'll start with your second question: is memory really "random access". Memory access in modern hardware is quite complicated, so any simple answer is necessarily (over-)simplified, but stripped to its essence, it goes something like this: Memory is random access in the sense that any unit of memory can be read or written at any time, but only one unit at a time. A "unit" here is more than a byte; it might be more than a "word", but the important thing is that they are non-overlapping. So you can read any unit, but you can't read any sequence of bytes unless they are all in the same unit. If the part of the sequence of bytes is in one unit and the rest is in the next unit, yiu have to read both units. And since it is one at a time, that takes longer than reading a single unit.

But that's still random access. It's not like a rotating disc, where you have to wait for the data you want to arrive at the read head. Or a tape drive, where you might need to wind or rewind the tape quite a distance before you get to the data you want.

Once upon a time, the ability of a CPU to assemble a sequence of bytes was pretty limited. It could, for example, take the first two bytes in a four-byte word, or the last two bytes, but not the middle two bytes. If you needed the middle two bytes, you'd have to read all four bytes, and then shift and mask to get the bits you were interested in. These days, we know how to get a lot more circuitry onto a chip, and copying any consecutive bytes from a memory unit generally works. And the units themselves have gotten somewhat wider. So alignment is less important than it used to be. But it still matters, because reading a sequence that lives in two consecutive units still requires two accessed, one after the other.

Also, not all CPUs are so flexible. On some CPUs, memory access to the middle of a word is not possible, so for those chips alignment is still very useful.

Now, alignment is a pretty abstract word; the concrete implementation might have different alignments for different data sizes. Abd that's the model compilers were generally designed for. So if a two-byte datum could be at the beginningor the end of a four-byte unit, then its address must be even. But the four-byte unit must have an address divisible by four. So order does make a difference. Consider a datum consisting of two individual bytes, one two-byte "half-word", a a four-byte word. If those are arranged byte, halfword, byte, word, then that will take 12 bytes: a byte of data, a byte of padding, a half word (even alignment), another byte, three bytes of padding, and the word. If we arranged them byte, byte, half, word, then there's no padding and the data fits in eight bytes.



Related Topics



Leave a reply



Submit