Is a String Literal in С++ Created in Static Memory

Is a string literal in С++ created in static memory?

Where it's created is an implementation decision by the compiler writer, really. Most likely, string literals will be stored in read-only segments of memory since they never change.

In the old compiler days, you used to have static data like these literals, and global but changeable data. These were stored in the TEXT (code) segment and DATA (initialised data) segment.

Even when you have code like char *x = "hello";, the hello string itself is stored in read-only memory while the variable x is on the stack (or elsewhere in writeable memory if it's a global). x just gets set to the address of the hello string. This allows all sorts of tricky things like string folding, so that "invalid option" (0x1000) and "valid option" (0x1002) can use the same memory block as follows:

+-> plus:0   1   2   3   4   5   6   7   8   9   A   B   C   D   E
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+----+
0x1000 | i | n | v | a | l | i | d | | o | p | t | i | o | n | \0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+----+

Keep in mind I don't mean read-only memory in terms of ROM, just memory that's dedicated to storing unchangeable stuff (which may be marked really read-only by the OS).

They're also never destroyed until main() exits.

C: do all string literals have static storage duration?

I've been reading in various sources that string literals remain in
memory for the whole lifetime of the program.

Yes.

In that case, what is
the difference between those two functions

char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}

f1 returns a pointer to the first element of the array represented by a string literal, which has static storage duration. f2 returns a pointer to the first element of the automatic array str. str has a string literal for an initializer, but it is a separate object.

While f1 compiles fine, f2 complains that I'm returning stack
allocated data. What happens here?

  • if the str points to the actual string literal (which has static duration), why do I get an error?

It does not. In fact, it itself does not point to anything. It is an array, not a pointer.

  • if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no
    reference to it?

C does not specify, but in practice, yes, some representation of the string literal must be stored somewhere in the program, perhaps in the function implementation, because it needs to be used to initialize str anew each time f2 is called.

How does memory gets allocated for string literal in c and do we need to free it?

where does this string get stored

Usually in read-only memory, you cannot modify it. In gcc, on most systems, they are located in the .TEXT section.

how does it get de-allocated

upon program termination.

How are string literals stored in memory for c++?

First the way the file is interpreted as a sequence of characters is implementation defined. You have to consult your compiler documentation for determining this.

Second the character set that is used is also implementation defined. So again you have to consult your compiler for this.

What's likely to happen when you insert non-ascii characters (possibly when using ascii too) is that the compiler would interpret them differently. You have to check that the different compilers actually can handle the same encoding, the most likely source encoding to work portably would be UTF-8.

In addition maybe you would be better of using UTF-8-encoded text for the most of the program (only near API that requires wchar_t would need to handle the strings this way).

Bottom line. Make sure your compiler stores the string literal verbatim and use ordinary (narrow) strings, and use an editor that saves in UTF-8 encoding.

How does C work with memory of local string literals?

   char data[] = "aaa";

This is not a string literal but just an array. So there's no pointer there and memory is allocated only for the data.

If the literal has static storage duration, no new memory is allocated
for the local literal by the second call

This is true for string literals like: char *s="aaa"; From the standard:

2.13. Sttring literals

[...]An ordinary string literal has type “array of n const char” and static storage duration (3.7)

where string literals will be stored

where string literals will be stored

String literals have static storage duration.

then how come it is possible to change the value in it.

You didn't change the string literal (which is something that cannot be done in C++).

You've created an object of type std::string. std::string contains a (potentially) dynamically allocated buffer. You've copied the string literal into that dynamic buffer, and you're modifying the copy of the string literal.

But with "hrllo".It should allocate new memory for "hrllo" right? and make a to point to new location?

No. Modifying characters of a std::string will not cause reallocation. Inserting characters however may potentially cause reallocation.

Memory usage of literal strings in C

When passing the string literal is it copied entirely as a local variable of sizeof(myString) or the compiler "knows" to pass it by reference since arrays are always passed by reference in C?

A string literal is stored as an array such that it's available over the lifetime of the program, and is subject to the same conversion rule as any other array expression; that is, except when it is the operand of the sizeof or unary & operators or is a string literal being used to initialize an array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element in the array. Thus, in the calls

myFunction( mystring );

and

myFunction( "A string" );

both of the arguments are expressions of array type, neither is the operand of the sizeof or unary & operators, so in both cases the expression decays to a pointer to the first element. As far as the function call is concerned, there's absolutely no difference between the two.

So let's look at a real-word example (SLES 10, x86_64, gcc 4.1.2)

#include <stdio.h>

void myFunction( const char *str )
{
printf( "str = %p\n", (void *) str );
printf( "str = %s\n", str );
}

int main( void )
{
static const char mystring[] = "A string";
myFunction( mystring );
myFunction( "A string" );

return 0;
}

myFunction prints out the address and contents of both the string literal and the mystring variable. Here are the results:

[fbgo448@n9dvap997]~/prototypes/literal: gcc -o literal -std=c99 -pedantic -Wall -Werror literal.c
[fbgo448@n9dvap997]~/prototypes/literal: ./literal
str = 0x400654
str = A string
str = 0x40065d
str = A string

Both the string literal and the mystring array are being stored in the .rodata (read-only) section of the executable:

[fbgo448@n9dvap997]~/prototypes/literal: objdump -s literal
...
Contents of section .rodata:
40063c 01000200 73747220 3d202570 0a007374 ....str = %p..st
40064c 72203d20 25730a00 41207374 72696e67 r = %s..A string
40065c 00412073 7472696e 6700 .A string.
...

The static keyword in the declaration of mystring tells the compiler that the memory for mystring should be set aside at program start and held until the program terminates. The const keyword says that memory should not be modifiable by the code. In this case, sticking it in the .rodata section makes perfect sense.

This means that no additional memory is allocated for mystring at runtime; it's already allocated as part of the image. In this particular case, for this particular platform, there's absolutely no difference between using one or the other.

If I don't declare mystring as static, as in

int main( void )
{
const char mystring[] = "A string";
...

then we get:

str = 0x7fff2fe49110
str = A string
str = 0x400674
str = A string

meaning that only the string literal is being stored in .rodata:

Contents of section .rodata:
40065c 01000200 73747220 3d202570 0a007374 ....str = %p..st
40066c 72203d20 25730a00 41207374 72696e67 r = %s..A string
40067c 00 .

Since it's declared local to main and not declared static, mystring is allocated with auto storage duration; in this case, that means memory will be allocated from the stack at runtime, and will be held for the duration of mystring's enclosing scope (i.e., the main function). As part of the declaration, the contents of the string literal will be copied to the array. Since it's allocated from the stack, the array is modifiable in principle, but the const keyword tells the compiler to reject any code that attempts to modify it.

Is it safe to store string literals pointers?

Is the above code well defined?

Yes.

Are there any dark corners of standard that I have to be aware of?

Perhaps not a dark corner in the standard but one problem is that you have a pointer and you allow for Base to be instantiated and used like this:

Base foo(nullptr);
foo.print();

From operator<<:
"The behavior is undefined if s is a null pointer."

A somewhat safer constructor:

template<size_t N>
constexpr Base(const char(&name)[N]) : _name(name) {}

I say somewhat because you still can do this:

auto Foo() {
const char scoped[] = "Fragile";
Base foo(scoped);
foo.print(); // OK
return foo;
} // "scoped" goes out of scope, "_name" is now dangling

int main() {
auto f = Foo();
f.print(); // UB
}

String literals: Where do they go?

A common technique is for string literals to be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you can't change it).

It does vary by platform. For example, simpler chip architectures may not support read-only memory segments so the data segment will be writable.

Rather than try to figure out a trick to make string literals changeable (it will be highly dependent on your platform and could change over time), just use arrays:

char foo[] = "...";

The compiler will arrange for the array to get initialized from the literal and you can modify the array.



Related Topics



Leave a reply



Submit