Why do compilers allow string literals not to be const?
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
Why doesn't my C compiler warn when I assign a string literal to a non-const pointer?
The answer you have quoted is an opinion without citation, and frankly nonsense. It is about nothing more than not breaking the vast quantity of existing legacy C code that it is desirable to remain compilable in a modern compiler.
However many compilers will issue a warning if you set the necessary warning level or options. In GCC for example:
-Wwrite-strings
When compiling C, give string constants the type
const char[length]
so that copying the address of one into a non-constchar*
pointer produces a warning. These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about usingconst
in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make-Wall
request these warnings.When compiling C++, warn about the deprecated conversion from string literals to char *. This warning is enabled by default for C++ programs.
CLANG also has -Wwrite-strings
, where is a synonym for -Wwriteable-strings
-Wwritable-strings
This diagnostic is enabled by default.
Also controls
-Wdeprecated-writable-strings
.Diagnostic text:
warning: ISO C++11 does not allow conversion from string literal to A
The diagnostic text is different for C compilation - I'm just quoting the manual.
In GCC with -Wwrite-strings
:
int main()
{
char* x = "hello" ;
return 0;
}
produces:
main.c:3:15: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
CLANG produces:
source_file.c:3:15: warning: initializing 'char *' with an expression of type 'const char [6]' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
Why are string literals const?
There are a couple of different reasons.
One is to allow storing string literals in read-only memory (as others have already mentioned).
Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:
char *foo = "long string";
char *bar = "string";
In a case like this, it's possible for bar
to be foo+5
(if I'd counted correctly).
In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.
By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.
Why are strings in C declared with 'const'?
There's no requirement to use const
, but it's a good idea.
In C, a string literal is an expression of type char[N]
, where N
is the length of the string plus 1 (for the terminating '\0'
null character). But attempting to modify the array that corresponds to the string literal has undefined behavior. Many compilers arrange for that array to be stored in read-only memory (not physical ROM, but memory that's marked read-only by the operating system). (An array expression is, in most contexts converted to a pointer expression referring to the initial element of the array object.)
It would have made more sense to make string literals const
, but the const
keyword did not exist in old versions of C, and it would have broken existing code. (C++ did make string literals const
).
This:
char *s= "example"; /* not recommended */
is actually perfectly valid in C, but it's potentially dangerous. If, after this declaration, you do:
s[0] = 'E';
then you're attempting to modify the string literal, and the behavior is undefined.
This:
const char *s= "example"; /* recommended */
is also valid; the char*
value that results from evaluating the string literal is safely and quietly converted to const char*
. And it's generally better than the first version because it lets the compiler warn you if you attempt to modify the string literal (it's better to catch errors at compile time than at run time).
If you get an error on your first example, then it's likely that you're inadvertently compiling your code as C++ rather than as C -- or that you're using gcc's -Wwrite-strings
option or something similar. (-Wwrite-strings
makes string literals const
; it can improve safety, but it can also cause gcc to reject, or at least warn about, valid C code.)
Why do (only) some compilers use the same address for identical string literals?
This is not undefined behavior, but unspecified behavior. For string literals,
The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.
That means the result of A == B
might be true
or false
, on which you shouldn't depend.
From the standard, [lex.string]/16:
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
String Literals
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello"
. strPtr
is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *
; you cannot legally remove the const
per the C standard and expect defined behavior when using it. strArray
is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const char
s, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
Why don't C compilers warn about incompatible types with literal strings?
TL;DR C compilers do not warn, because they do not "see" a problem there. By definition, C string literals are null terminated char
arrays. It's only stated that,
[...] If the program attempts to modify such an array, the behavior is
undefined.
So, in the compilation process, it is not known to the compiler that a char
array should behave as a string literal or string. Only the attempt to modification is prohibited.
Related read: For anybody interested, see Why are C string literals read-only?
That said, I am not very sure whether this is a good option, but gcc
has -Wwrite-strings
option.
Quoting the online manual,
-Wwrite-strings
When compiling C, give string constants the type
const char[length]
so that copying the address of one into a non-const char *
pointer produces a warning. These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make-Wall
request these warnings.
So, it produces a warning using the backdoor way.
By definition, C string literals (i.e., character string literals) are char
arrays with null terminator. The standard does not mandate them to be const
qualified.
Ref: C11
, chapter
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
typechar
, and are initialized with the individual bytes of the multibyte character
sequence. [....]
Using the aforesaid option makes the string literals const
qualified so using a string literal as the RHS of assignment to a non-const type pointer triggers a warning.
This is done with reference to C11
, chapter §6.7.3
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. [...]
So, here the compiler produces a warning for the assignment of const
qualified type to a non-const
-qualified type.
Related to why using -Wall -Wextra -pedantic -std=c11
does not produce this warning, is, quoting the quote once again
[...] These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make
-Wall
request these warnings.
Why gcc does not give a warning when you initialize an array without const with strings?
So why I don't get a warning in the first initialization?
Because the type of a string literal is array of char
, not array of const char
, notwithstanding the fact that modifying the elements of such an array produces undefined behavior. This comes down from the very first days of C, when there was no const
. I'm sure its persistence into modern C revolves around the magnitude and scope of the incompatibility that would arise if the type were changed.
With respect to individual programs, however, GCC can help you out. If you turn on its -Wwrite-strings
option then it will indeed give string literals type const char [
length
]
, with the result that a construct such as you presented will elicit a warning.
String literals that contain '\0' - why aren't they the same?
Is it guaranteed that a==b?
No. But it is allowed by §2.14.5/12:
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
And as you can see from that last sentence using char*
instead of char const*
is a recipe for trouble (and your compiler should be rejecting it; make sure you have warnings enabled and high conformance levels selected).
Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
No, they're not required to be referring to same array of characters. One has five elements, the other six. An implementation could store the two in overlapping storage, but that's not required.
Is an extra \0 appended at the end of c, even though it already contains one?
Yes.
Related Topics
Using Cmake with Multiple Compilers for the Same Language
Structure of a C++ Object in Memory VS a Struct
Error with Multiple Definitions of Function
How to Define a Template Function Within a Template Class Outside of the Class Definition
Why No Variable Size Array in Stack
How to Validate an Integer Input
Structure of Arrays VS Array of Structures
Why Is 'I = ++I + 1' Unspecified Behavior
What's the Meaning of * and & When Applied to Variable Names
Mixing C++11 Atomics and Openmp
How to Specify Preference of Library Path
Stack-Buffer Based Stl Allocator
C++ Inlining Class Methods Causes Undefined Reference
How to Set a Timeout on Blocking Sockets in Boost Asio
"The System Cannot Find the File Specified" When Running C++ Program
Comparison of Double, Long Double, Float and Float128
What Is the First (Int (*)(...))0 Vtable Entry in the Output of G++ -Fdump-Class-Hierarchy