Why Are String Literals Const

Why are string literals const?

There are a couple of different reasons.

One is to allow storing string literals in read-only memory (as others have already mentioned).

Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:

char *foo = "long string";
char *bar = "string";

In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).

In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.

By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.

Are string literals const?

They are of type char[N] where N is the number of characters including the terminating \0. So yes you can assign them to char*, but you still cannot write to them (the effect will be undefined).

Wrt argv: It points to an array of pointers to strings. Those strings are explicitly modifiable. You can change them and they are required to hold the last stored value.

Why do string literals (char*) in C++ have to be constants?

Expanding on Christian Gibbons' answer a bit...

In C, string literals, like "Hello, World!", are stored in arrays of char such that they are visible over the lifetime of the program. String literals are supposed to be immutable, and some implementations will store them in a read-only memory segment (such that attempting to modify the literal's contents will trigger a runtime error). Some implementations don't, and attempting to modify the literal's contents may not trigger a runtime error (it may even appear to work as intended). The C language definition leaves the behavior "undefined" so that the compiler is free to handle the situation however it sees fit.

In C++, string literals are stored in arrays of const char, so that any attempt to modify the literal's contents will trigger a diagnostic at compile time.

As Christian points out, the const keyword was not originally a part of C. It was, however, originally part of C++, and it makes using string literals a little safer.

Remember that the const keyword does not mean "store this in read-only memory", it only means "this thing may not be the target of an assignment."

Also remember that, unless it is the operand of the sizeof or unary * operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.

In C++, when you write

const char *str = "Hello, world";

the address of the first character of the string is stored to str. You can set str to point to a different string literal:

str = "Goodbye cruel world";

but what you cannot do is modify the contents of the string, something like

str[0] = 'h';

or

strcpy( str, "Something else" );

Why does a function returning const char * with string literals work?

Those string literals are placed in a static read-only section of the executable at compile time. They are separate from the heap or the stack. The function is simply returning a pointer that points to those strings.

Reference

The implementation of this is platform and compiler specific, but the C11 standard has a few relevant requirements on this in section 6.4.5.

In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence.

So we know it must be stored in a static location at compile time.

If the program attempts to modify such an array, the behavior is
undefined.

This tells us the data must be read-only.

Edit

Some people are complaining that this is incorrect, citing specific platforms or architectures. As noted above this is platform and compiler specific.

Some platforms, may not support read-only data, but the compiler will almost certainly try to prevent you from modifying it. Since the behavior is undefined, the intent is that you never do this, so for all intents and purposes the data is read-only.

In the context of the question, this answer is correct.

Role of const for a C array of string literals

Your declaration const char* const myArray[] expands to -

an array of constant pointer to const characters

Which means, it is an array of pointers which are constant, so cannot be changed once initialized. And the characters it points to are also constant, meaning you can dereference the pointer to only read the characters but cannot overwrite them.

This array is being initialized to an array containing three strings "one", "two" and "three".

So the operations like

myArray[i] = ...;
*myArray[i] = ...;

would fail, but operations like -

otherArray = myArray[i];
char t = *myArray[i];

would be fine.

This is also the reason why code like myArray[0] = "uno" doesn't compile.

You are trying to assign a new string to myArray[0], but it is declared as a const (the const that comes after the * causes this).

Now coming to your question about myArray[0][0] = 'u';. Even if you remove the first const, string literals are by default const. You cannot change the characters they contain. The compiler doesn't complain about assignment because it doesn't know that myArray[0] is pointing to a constant string.

This is an issue with the C standard, that the string literals have a type char* and not const char*. This cannot be changed now because of lot of legacy code that uses this would break.

Why can a const reference to a string parameter take string literals?

When you do

result.doSomething("asdas");

The compiler looks to see if you have a doSomething(const char[]); and finds nothing. Since there is not suitable function it then tries to find an overload that takes something that can be constructed from a const char[] and it finds doSomething(const string& str). Since the compiler is allowed to make one user defined conversion it constructs a temporary std::string from the string literal and pass that temporary, via reference to const, to the function.

Shouldn't the parameter list consist of string& str instead of the const reference, so that the literal would be used in the construction of str?

No, this only works with a reference to const and will not work with a regular reference as regular references cannot bind to temporaries.

And doesn't a const reference keep the referenced entity alive for as long as the reference is alive? If so, why would you do that to a literal?

The reference to const will extend the lifetime of the object only when it is a function local object. Inside the scope of the function the object will be alive as the expression that called that function has not ended but if you were to try and keep a reference to the std::string in the class that would not work.

Effectively you code is translated into

int main()
{
CVector result
{
std::string temp = "asdas";
result.doSomething(temp);
}
result.print();
return 0;
}

What happens when a string literal is passed to a function accepting a const std::string & in C++?

When you pass a string literal to a function that accepts const std::string&, the following events occur:

  • The string literal is converted to const char*
  • A temporary std::string object is created. Its internal buffer is allocated, and initialized by copying the data from the const char* until the terminating null is seen. The parameter refers to this temporary object.
  • The function body runs.
  • Assuming the function returns normally, the temporary object is destroyed at some unspecified point between when the function returns and the end of the calling expression.

If the c_str() pointer is saved from the parameter, it becomes a dangling pointer after the temporary object is destroyed since it points into the temporary object's internal buffer.

A similar problem will occur if the function accepts std::string. The std::string object will be created when the function is called and destroyed when the function returns or soon afterward, so any saved c_str() pointer will become dangling.

If the function accepts const std::string& and the argument has type std::string, however, no new object is created when the function is called. The reference refers to the existing object. The c_str() pointer will remain valid until the original std::string object is destroyed.



Related Topics



Leave a reply



Submit