Does the Evil Cast Get Trumped by the Evil Compiler

Does the evil cast get trumped by the evil compiler?

Looks like the compiler is optimizing

printf("IAMCONST %d \n",IAMCONST);

into

printf("IAMCONST %d \n",3);

since you said that IAMCONST is a const int.

But since you're taking the address of IAMCONST, it has to actually be located on the stack somewhere, and the constness can't be enforced, so the memory at that location (*pTOCONST) is mutable after all.

In short: you casted away the constness, don't do that. Poor, defenseless C...

Addendum

Using GCC for x86, with -O0 (no optimizations), the generated assembly

main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $36, %esp
movl $3, -12(%ebp)
leal -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl $7, (%eax)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf

copies from *(bp-12) on the stack to printf's arguments. However, using -O1 (as well as -Os, -O2, -O3, and other optimization levels),

main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $3, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $7, 4(%esp)
movl $.LC1, (%esp)
call printf

you can clearly see that the constant 3 is used instead.

If you are using Visual Studio's CL.EXE, /Od disables optimization. This varies from compiler to compiler.

Be warned that the C specification allows the C compiler to assume that the target of any int * pointer never overlaps the memory location of a const int, so you really shouldn't be doing this at all if you want predictable behavior.

What is the purpose of const qualifier if I can modify it through a pointer in C?

The reason you could modify the value is because you did a pointer typecast that stripped off the constness:

int *p = (int *)&a;

This typecasts a const int* (namely &a) to an int *, allowing you to freely modify the variable. Normally the compiler would warn you about this, but the explicit typecast suppressed the warning.

The main rationale behind const at all is to prevent you from accidentally modifying something that you promised not to. It's not sacrosanct, as you've seen, and you can cast away constness with impunity, much in the same way that you can do other unsafe things like converting pointers to integers or vice-versa. The idea is that you should try your best not to mess with const, and the compiler will warn you if you do. Of course, adding in a cast tells the compiler "I know what I'm doing," and so in your case the above doesn't generate any sort of warnings.

How internally this works int const iVal = 5; (int&)iVal = 10;

It is undefined behavior.

In the first line you define a constant integer. Henceforth, in your program, the compiler is permitted to just substitute iVal with the value 5. It may load it from memory instead, but probably won't, because that would bring no benefit.

The second line writes to the memory location that your compiler tells you contains the number 5. However, this is not guaranteed to have any effect, as you've already told the compiler that the value won't change.

For example, the following will define an array of 5 elements, and print an undefined value (or it can do anything it wants! it's undefined)

int const iVal = 5;
(int&)iVal = 10;
char arr[iVal];
cout << iVal;

The generated assembly might look something like:

sub ESP, 9      ; allocate mem for arr and iVal. hardcoded 5+sizeof(int) bytes
; (iVal isn't _required_ to have space allocated to it)
mov $iVal, 10 ; the compiler might do this, assuming that you know what
; you're doing. But then again, it might not.
push $cout
push 5
call $operator_ltlt__ostream_int
add ESP, 9

In C, can a const variable be modified via a pointer?

const actually doesn't mean "constant". Something that's "constant" in C has a value that's determined at compile time; a literal 42 is an example. The const keyword really means read-only. Consider, for example:

const int r = rand();

The value of r is not determined until program execution time, but the const keyword means that you're not permitted to modify r after it's been initialized.

In your code:

const int x=1;
int *ptr;
ptr = &x;
*ptr = 2;

the assignment ptr = &x; is a constraint violation, meaning that a conforming compiler is required to complain about it; you can't legally assign a const int* (pointer to const int) value to a non-const int* object. If the compiler generates an executable (which it needn't do; it could just reject it), then the behavior is not defined by the C standard.

For example, the generated code might actually store the value 2 in x -- but then a later reference to x might yield the value 1, because the compiler knows that x can't have been modified after its initialization. And it knows that because you told it so, by defining x as const. If you lie to the compiler, the consequences can be arbitrarily bad.

Actually, the worst thing that can happen is that the program behaves as you expect it to; that means you have a bug that's very difficult to detect. (But the diagnostic you should have gotten will have been a large clue.)

Why can I change a local const variable through pointer casts but not a global one in C?

It's in read only memory!

Basically, your computer resolves virtual to physical addresses using a two level page table system. Along with that grand data structure comes a special bit representing whether or not a page is readable. This is helpful, because user processes probably shouldn't be over writing their own assembly (although self-modifying code is kind of cool). Of course, they probably also shouldn't be over writing their own constant variables.

You can't put a "const" function-level variable into read only memory, because it lives in the stack, where it MUST be on a read-write page. However, the compiler/linker sees your const, and does you a favor by putting it in read only memory (it's constant). Obviously, overwriting that will cause all kinds of unhappiness for the kernel who will take out that anger on the process by terminating it.

Why is type punning considered UB?

Ultimately the why is "because the language specification says so". You don't get to argue with that. If that's the way the language is, it's the way it is.

If you want to know the motivation for making it that way, it's that the original C language lacked any way of expressing that two lvalues can't alias one another (and the modern language's restrict keyword is still barely understood by most users of the language). Being unable to assume two lvalues can't alias means the compiler can't reorder loads and stores, and must actually perform loads and stores from/to memory for every access to an object, rather than keeping values in registers, unless it knows the object's address has never been taken.

C's type-based aliasing rules somewhat mitigate this situation, by letting the compiler assume lvalues with different types don't alias.

Note also that in your example, there's not only type-punning but misalignment. The unsigned char array has no inherent alignment, so accessing a uint64_t at that address would be an alignment error (UB for another reason) independent of any aliasing rules.

What are the common undefined/unspecified behavior for C that you run into?

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule

  2. violating the strict aliasing rule

  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}

That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.

There are three valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}

The third valid way: use unions. This is explicitly not undefined since C99:

float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;

cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}


Related Topics



Leave a reply



Submit