Why Do String Literals (Char*) in C++ Have to Be Constants

Why do string literals (char*) in C++ have to be constants?

Expanding on Christian Gibbons' answer a bit...

In C, string literals, like "Hello, World!", are stored in arrays of char such that they are visible over the lifetime of the program. String literals are supposed to be immutable, and some implementations will store them in a read-only memory segment (such that attempting to modify the literal's contents will trigger a runtime error). Some implementations don't, and attempting to modify the literal's contents may not trigger a runtime error (it may even appear to work as intended). The C language definition leaves the behavior "undefined" so that the compiler is free to handle the situation however it sees fit.

In C++, string literals are stored in arrays of const char, so that any attempt to modify the literal's contents will trigger a diagnostic at compile time.

As Christian points out, the const keyword was not originally a part of C. It was, however, originally part of C++, and it makes using string literals a little safer.

Remember that the const keyword does not mean "store this in read-only memory", it only means "this thing may not be the target of an assignment."

Also remember that, unless it is the operand of the sizeof or unary * operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.

In C++, when you write

const char *str = "Hello, world";

the address of the first character of the string is stored to str. You can set str to point to a different string literal:

str = "Goodbye cruel world";

but what you cannot do is modify the contents of the string, something like

str[0] = 'h';

or

strcpy( str, "Something else" );

Why is conversion from string constant to 'char*' valid in C but invalid in C++

Up through C++03, your first example was valid, but used a deprecated implicit conversion--a string literal should be treated as being of type char const *, since you can't modify its contents (without causing undefined behavior).

As of C++11, the implicit conversion that had been deprecated was officially removed, so code that depends on it (like your first example) should no longer compile.

You've noted one way to allow the code to compile: although the implicit conversion has been removed, an explicit conversion still works, so you can add a cast. I would not, however, consider this "fixing" the code.

Truly fixing the code requires changing the type of the pointer to the correct type:

char const *p = "abc"; // valid and safe in either C or C++.

As to why it was allowed in C++ (and still is in C): simply because there's a lot of existing code that depends on that implicit conversion, and breaking that code (at least without some official warning) apparently seemed to the standard committees like a bad idea.

Why do I get forbids converting a string constant to ‘char*’ in C++?

Why do I get "forbids converting a string constant to ‘char*’" in C++?

The error message means that you are trying to pass a string literal to the function.

String literals in C++ have types of constant character arrays that passed by value to functions are implicitly converted to the type const char *. And any attempt to change a string literal results in undefined behavior.

You could pass to the function a character array initialized by a string literal as for example

char s[] = "Hello";
std::cout << invertirCase( s ) << '\n';

In turn the function can be defined the following way

#include <cctype>

char * invertirCase( char *str )
{
for ( char *p = str; *p; ++p )
{
unsigned char c = *p;

if ( std::isalpha( c ) )
{
if ( std::islower( c ) )
{
*p = std::toupper( c );
}
else
{
*p = std::tolower( c );
}
}
}

return str;
}

or

char * invertirCase( char *str )
{
for ( char *p = str; *p; ++p )
{
unsigned char c = *p;

if ( std::islower( c ) )
{
*p = std::toupper( c );
}
else if ( std::isupper( c ) )
{
*p = std::tolower( c );
}
}

return str;
}

Pointers create constant string literals, why?

The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.

What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.

With that, let’s turn to your questions:

Why are string literals created in First Case read-only?

The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.

Why are only string literals allowed to be created and not other constants like float?

Again, wrong assumption. You can create other constants:

float f = 1.23f;

Here, the 1.23f literal is read-only. You can also assign it to a constant variable:

const float f = 1.23f; 

Why is Second Case not giving me compiler errors?

Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:

char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!

p = &str[0];

p[1] = 'x';

Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.

Difference between char* and const char*?

char* is a mutable pointer to a mutable character/string.

const char* is a mutable pointer to an immutable character/string. You cannot change the contents of the location(s) this pointer points to. Also, compilers are required to give error messages when you try to do so. For the same reason, conversion from const char * to char* is deprecated.

char* const is an immutable pointer (it cannot point to any other location) but the contents of location at which it points are mutable.

const char* const is an immutable pointer to an immutable character/string.

Addresses of two char pointers to different string literals are same

Whether two different string literals with same content is placed in the same memory location or different memory locations is implementation-dependent.

You should always treat p and p1 as two different pointers (even though they have the same content) as they may or may not point to the same address. You shouldn't rely on compiler optimizations.

C11 Standard, 6.4.5, String literals, semantics

It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.


The format for printing must be %p:

  printf("%p %p", (void*)p, (void*)p1);

See this answer for why.

String (constants) Literals pointer

When you use the syntax

const char* cstr="string";

C++ defines:

  • An array of 7 character in the read-only section of memory, with the contents string\0 in it.
  • pointer on the stack (or in the writable global section of memory), with the address of that array.

However, when you use the syntax:

char str[7]="string";

C++ defines:

  • An array of 7 character on the stack (or in the writable global section of memory), with the contents "string\0" in it.

In the first case, the actual values are in read-only memory, so you can't change them. In the second case, they are in writable memory (stack or global).

C++ tries to enforce this semantic, so if the definition is read-only memory, you should use a const pointer.

Note that not all architectures have read-only memory, but because most of them do, and C++ might want to use the read-only memory feature (for better correctness), then C++ programmers should assume (for the purpose of pointer types) that constants are going to be placed in read-only memory.

What is the type of string literals in C and C++?

In C the type of a string literal is a char[] - it's not const according to the type, but it is undefined behavior to modify the contents. Also, 2 different string literals that have the same content (or enough of the same content) might or might not share the same array elements.

From the C99 standard 6.4.5/5 "String Literals - Semantics":

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

In C++, "An ordinary string literal has type 'array of n const char'" (from 2.13.4/1 "String literals"). But there's a special case in the C++ standard that makes pointer to string literals convert easily to non-const-qualified pointers (4.2/2 "Array-to-pointer conversion"):

A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”.

As a side note - because arrays in C/C++ convert so readily to pointers, a string literal can often be used in a pointer context, much as any array in C/C++.


Additional editorializing: what follows is really mostly speculation on my part about the rationale for the choices the C and C++ standards made regarding string literal types. So take it with a grain of salt (but please comment if you have corrections or additional details):

I think that the C standard chose to make string literal non-const types because there was (and is) so much code that expects to be able to use non-const-qualified char pointers that point to literals. When the const qualifier got added (which if I'm not mistaken was done around ANSI standardization time, but long after K&R C had been around to accumulate a ton of existing code) if they made pointers to string literals only able to be be assigned to char const* types without a cast nearly every program in existence would have required changing. Not a good way to get a standard accepted...

I believe the change to C++ that string literals are const qualified was done mainly to support allowing a literal string to more appropriately match an overload that takes a "char const*" argument. I think that there was also a desire to close a perceived hole in the type system, but the hole was largely opened back up by the special case in array-to-pointer conversions.

Annex D of the standard indicates that the "implicit conversion from const to non-const qualification for string literals (4.2) is deprecated", but I think so much code would still break that it'll be a long time before compiler implementers or the standards committee are willing to actually pull the plug (unless some other clever technique can be devised - but then the hole would be back, wouldn't it?).

understand how char works in c++

char * str = "Test"; is not allowed in C++. A string literal can only be pointed to by a pointer to const. You would need const char * str = "Test";.

If your compiler accepts char * str = "Test"; it is likely outdated. This conversion has not been allowed since C++11 (which came out over 10 years ago).



how does char * str = "Test"; work?

String literals are implicitly convertible to a pointer to the start of the literal. In C++ arrays are implicitly convertible to pointer to their first element. For example int x[10] is implicitly convertible to int*, the conversion results in &(x[0]). This applies to string literals, their type is a const array of characters (const char[]).



how come the following code prints out "Test" twice in a row?

In C++ most features related to character strings assume the string is null terminated, which is implied in string literals. You would need {'T','e','s','t','\0'} to be equivalent to "Test".

Are string literals constant or not?

That is correct, you are not allowed to modify string literals.

However, it's legal to do this:

char s[] = "anusha";
s[3] = 'k'

The difference here being that it is stored as a local array that can be modified.



Related Topics



Leave a reply



Submit