C++ String Literal Data Type Storage

What is the data type of a string literal in C++?

Expressions have type. String literals have type if they are used as an expression. Yours isn't.

Consider the following code:

#include <stdio.h>

#define STR "HelloHelloHello"

char global[] = STR;

int main(void)
{
char local[] = STR;
puts(STR);
}

There are three string literals in this program formed using the same tokens, but they are not treated the same.

The first, the initializer for global, is part of static initialization of an object with static lifetime. By section 3.6.2, static initialization doesn't have to take place at runtime; the compiler can arrange for the result to be pre-formatted in the binary image so that the process starts execution with the data already in place, and it has done so here. It would also be legal to initialize this object in the same fashion as local[], as long as it was performed before the beginning of dynamic initialization of globals.

The second, the initializer for local, is a string literal, but it isn't really an expression. It is handled under the special rules of 8.5.2, which states that the characters within the string literal are independently used to initialize the array elements; the string literal is not used as a unit. This object has dynamic initialization, resulting in loading the value at runtime.

The third, an argument to the puts() call, actually does use the string literal as an expression, and it will have type const char[N], which decays to const char* for the call. If you really want to study object code used to handle the runtime type of a string literal, you should be using the literal in an expression, like this function call does.

String literals: Where do they go?

A common technique is for string literals to be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you can't change it).

It does vary by platform. For example, simpler chip architectures may not support read-only memory segments so the data segment will be writable.

Rather than try to figure out a trick to make string literals changeable (it will be highly dependent on your platform and could change over time), just use arrays:

char foo[] = "...";

The compiler will arrange for the array to get initialized from the literal and you can modify the array.

C: do all string literals have static storage duration?

I've been reading in various sources that string literals remain in
memory for the whole lifetime of the program.

Yes.

In that case, what is
the difference between those two functions

char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}

f1 returns a pointer to the first element of the array represented by a string literal, which has static storage duration. f2 returns a pointer to the first element of the automatic array str. str has a string literal for an initializer, but it is a separate object.

While f1 compiles fine, f2 complains that I'm returning stack
allocated data. What happens here?

  • if the str points to the actual string literal (which has static duration), why do I get an error?

It does not. In fact, it itself does not point to anything. It is an array, not a pointer.

  • if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no
    reference to it?

C does not specify, but in practice, yes, some representation of the string literal must be stored somewhere in the program, perhaps in the function implementation, because it needs to be used to initialize str anew each time f2 is called.

where string literals will be stored

where string literals will be stored

String literals have static storage duration.

then how come it is possible to change the value in it.

You didn't change the string literal (which is something that cannot be done in C++).

You've created an object of type std::string. std::string contains a (potentially) dynamically allocated buffer. You've copied the string literal into that dynamic buffer, and you're modifying the copy of the string literal.

But with "hrllo".It should allocate new memory for "hrllo" right? and make a to point to new location?

No. Modifying characters of a std::string will not cause reallocation. Inserting characters however may potentially cause reallocation.

Is it safe to store string literals pointers?

Is the above code well defined?

Yes.

Are there any dark corners of standard that I have to be aware of?

Perhaps not a dark corner in the standard but one problem is that you have a pointer and you allow for Base to be instantiated and used like this:

Base foo(nullptr);
foo.print();

From operator<<:
"The behavior is undefined if s is a null pointer."

A somewhat safer constructor:

template<size_t N>
constexpr Base(const char(&name)[N]) : _name(name) {}

I say somewhat because you still can do this:

auto Foo() {
const char scoped[] = "Fragile";
Base foo(scoped);
foo.print(); // OK
return foo;
} // "scoped" goes out of scope, "_name" is now dangling

int main() {
auto f = Foo();
f.print(); // UB
}

How does C work with memory of local string literals?

   char data[] = "aaa";

This is not a string literal but just an array. So there's no pointer there and memory is allocated only for the data.

If the literal has static storage duration, no new memory is allocated
for the local literal by the second call

This is true for string literals like: char *s="aaa"; From the standard:

2.13. Sttring literals

[...]An ordinary string literal has type “array of n const char” and static storage duration (3.7)

How are string literals stored in memory for c++?

First the way the file is interpreted as a sequence of characters is implementation defined. You have to consult your compiler documentation for determining this.

Second the character set that is used is also implementation defined. So again you have to consult your compiler for this.

What's likely to happen when you insert non-ascii characters (possibly when using ascii too) is that the compiler would interpret them differently. You have to check that the different compilers actually can handle the same encoding, the most likely source encoding to work portably would be UTF-8.

In addition maybe you would be better of using UTF-8-encoded text for the most of the program (only near API that requires wchar_t would need to handle the strings this way).

Bottom line. Make sure your compiler stores the string literal verbatim and use ordinary (narrow) strings, and use an editor that saves in UTF-8 encoding.

Is a string literal in С++ created in static memory?

Where it's created is an implementation decision by the compiler writer, really. Most likely, string literals will be stored in read-only segments of memory since they never change.

In the old compiler days, you used to have static data like these literals, and global but changeable data. These were stored in the TEXT (code) segment and DATA (initialised data) segment.

Even when you have code like char *x = "hello";, the hello string itself is stored in read-only memory while the variable x is on the stack (or elsewhere in writeable memory if it's a global). x just gets set to the address of the hello string. This allows all sorts of tricky things like string folding, so that "invalid option" (0x1000) and "valid option" (0x1002) can use the same memory block as follows:

+-> plus:0   1   2   3   4   5   6   7   8   9   A   B   C   D   E
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+----+
0x1000 | i | n | v | a | l | i | d | | o | p | t | i | o | n | \0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+----+

Keep in mind I don't mean read-only memory in terms of ROM, just memory that's dedicated to storing unchangeable stuff (which may be marked really read-only by the OS).

They're also never destroyed until main() exits.

where does string literal passed to function call gets stored in c

A C string literal represents an array object of type char[len+1], where len is the length, plus 1 for the terminating '\0'. This array object has static storage duration, meaning that it exists for the entire execution of the program. This applies regardless of where the string literal appears.

The literal itself is an expression type char[len+1]. (In most but not all contexts, it will be implicitly converted to a char* value pointing to the first character.)

Compilers may optimize this by, for example, storing identical string literals just once, or by not storing them at all if they're never referenced.

If you write this:

const char *s="Hello World\n";

inside a function, the literal's meaning is as I described above. The pointer object s is initialized to point to the first character of the array object.

For historical reasons, string literals are not const in C, but attempting to modify the corresponding array object has undefined behavior. Declaring the pointer const, as you've done here, is not required, but it's an excellent idea.



Related Topics



Leave a reply



Submit