Why Do Compilers Allow String Literals Not to Be Const

Why do compilers allow string literals not to be const?

The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.

Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.

Why doesn't my C compiler warn when I assign a string literal to a non-const pointer?

The answer you have quoted is an opinion without citation, and frankly nonsense. It is about nothing more than not breaking the vast quantity of existing legacy C code that it is desirable to remain compilable in a modern compiler.

However many compilers will issue a warning if you set the necessary warning level or options. In GCC for example:

-Wwrite-strings

When compiling C, give string constants the type const char[length] so that copying the address of one into a non-const char* pointer produces a warning. These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make -Wall request these warnings.

When compiling C++, warn about the deprecated conversion from string literals to char *. This warning is enabled by default for C++ programs.

CLANG also has -Wwrite-strings, where is a synonym for -Wwriteable-strings

-Wwritable-strings

This diagnostic is enabled by default.

Also controls -Wdeprecated-writable-strings.

Diagnostic text:

warning: ISO C++11 does not allow conversion from string literal to A

The diagnostic text is different for C compilation - I'm just quoting the manual.

In GCC with -Wwrite-strings:

int main()
{
char* x = "hello" ;
return 0;
}

produces:

main.c:3:15: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]    

CLANG produces:

source_file.c:3:15: warning: initializing 'char *' with an expression of type 'const char [6]' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]

Why are string literals const?

There are a couple of different reasons.

One is to allow storing string literals in read-only memory (as others have already mentioned).

Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:

char *foo = "long string";
char *bar = "string";

In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).

In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.

By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.

Why are strings in C declared with 'const'?

There's no requirement to use const, but it's a good idea.

In C, a string literal is an expression of type char[N], where N is the length of the string plus 1 (for the terminating '\0' null character). But attempting to modify the array that corresponds to the string literal has undefined behavior. Many compilers arrange for that array to be stored in read-only memory (not physical ROM, but memory that's marked read-only by the operating system). (An array expression is, in most contexts converted to a pointer expression referring to the initial element of the array object.)

It would have made more sense to make string literals const, but the const keyword did not exist in old versions of C, and it would have broken existing code. (C++ did make string literals const).

This:

char *s= "example"; /* not recommended */

is actually perfectly valid in C, but it's potentially dangerous. If, after this declaration, you do:

s[0] = 'E';

then you're attempting to modify the string literal, and the behavior is undefined.

This:

const char *s= "example"; /* recommended */

is also valid; the char* value that results from evaluating the string literal is safely and quietly converted to const char*. And it's generally better than the first version because it lets the compiler warn you if you attempt to modify the string literal (it's better to catch errors at compile time than at run time).

If you get an error on your first example, then it's likely that you're inadvertently compiling your code as C++ rather than as C -- or that you're using gcc's -Wwrite-strings option or something similar. (-Wwrite-strings makes string literals const; it can improve safety, but it can also cause gcc to reject, or at least warn about, valid C code.)

Why do (only) some compilers use the same address for identical string literals?

This is not undefined behavior, but unspecified behavior. For string literals,

The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.

That means the result of A == B might be true or false, on which you shouldn't depend.

From the standard, [lex.string]/16:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

String Literals

Now strPtr and strArray are considered to be string literals.

No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).

Both the above statements should be illegal. compiler should throw errors in both cases.

No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.

As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:

// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };

// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };

char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)

My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.

Why don't C compilers warn about incompatible types with literal strings?

TL;DR C compilers do not warn, because they do not "see" a problem there. By definition, C string literals are null terminated char arrays. It's only stated that,

[...] If the program attempts to modify such an array, the behavior is
undefined.

So, in the compilation process, it is not known to the compiler that a char array should behave as a string literal or string. Only the attempt to modification is prohibited.

Related read: For anybody interested, see Why are C string literals read-only?

That said, I am not very sure whether this is a good option, but gcc has -Wwrite-strings option.

Quoting the online manual,

-Wwrite-strings

When compiling C, give string constants the type const char[length] so that copying the address of one into a non-const char * pointer produces a warning. These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make -Wall request these warnings.

So, it produces a warning using the backdoor way.

By definition, C string literals (i.e., character string literals) are char arrays with null terminator. The standard does not mandate them to be const qualified.

Ref: C11, chapter

In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [....]

Using the aforesaid option makes the string literals const qualified so using a string literal as the RHS of assignment to a non-const type pointer triggers a warning.

This is done with reference to C11, chapter §6.7.3

If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. [...]

So, here the compiler produces a warning for the assignment of const qualified type to a non-const-qualified type.

Related to why using -Wall -Wextra -pedantic -std=c11 does not produce this warning, is, quoting the quote once again

[...] These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make -Wall request these warnings.

Why gcc does not give a warning when you initialize an array without const with strings?

So why I don't get a warning in the first initialization?

Because the type of a string literal is array of char, not array of const char, notwithstanding the fact that modifying the elements of such an array produces undefined behavior. This comes down from the very first days of C, when there was no const. I'm sure its persistence into modern C revolves around the magnitude and scope of the incompatibility that would arise if the type were changed.

With respect to individual programs, however, GCC can help you out. If you turn on its -Wwrite-strings option then it will indeed give string literals type const char [length], with the result that a construct such as you presented will elicit a warning.

String literals that contain '\0' - why aren't they the same?

Is it guaranteed that a==b?

No. But it is allowed by §2.14.5/12:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

And as you can see from that last sentence using char* instead of char const* is a recipe for trouble (and your compiler should be rejecting it; make sure you have warnings enabled and high conformance levels selected).

Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?

No, they're not required to be referring to same array of characters. One has five elements, the other six. An implementation could store the two in overlapping storage, but that's not required.

Is an extra \0 appended at the end of c, even though it already contains one?

Yes.



Related Topics



Leave a reply



Submit