String Literals

What is the difference between string literals and string values?

A string literal is a piece of text you can write in your program's source code, beginning and ending with quotation marks, that tells Python to create a string with certain contents. It looks like

'asdf'

or

'''
multiline
content
'''

or

'the thing at the end of this one is a line break\n'

In a string literal (except for raw string literals), special sequences of characters known as escape sequences in the string literal are replaced with different characters in the actual string. For example, the escape sequence \n in a string literal is replaced with a line feed character in the actual string. Escape sequences begin with a backslash.


A string is a Python object representing a text value. It can be built from a string literal, or it could be read from a file, or it could originate from many other sources.

Backslashes in a string have no special meaning, and backslashes in most possible sources of strings have no special meaning either. For example, if you have a file with backslashes in it, looking like this:

asdf\n

and you do

with open('that_file.txt') as f:
text = f.read()

the \n in the file will not be replaced by a line break. Backslashes are special in string literals, but not in most other contexts.


When you ask for the repr representation of a string, either by calling repr or by displaying the string interactively:

>>> some_string = "asdf"
>>> some_string
'asdf'

Python will build a new string whose contents are a string literal that would evaluate to the original string. In this example, some_string does not have ' or " characters in it. The contents of the string are the four characters asdf, the characters displayed if you print the string:

>>> print(some_string)
asdf

However, the repr representation has ' characters in it, because 'asdf' is a string literal that would evaluate to the string. Note that 'asdf' is not the same string literal as the "asdf" we originally used - many different string literals can evaluate to equal strings.

Is storage for the same content string literals guaranteed to be the same?

The Standard does not guarantee the addresses of string literals with the same content will be the same. In fact, [lex.string]/16 says:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

The second part even says you might not get the same address when a function containing a string literal is called a second time! Though I've never seen a compiler do that.

So using the same character array object when a string literal is repeated is an optional compiler optimization. With my installation of g++ and default compiler flags, I also find I get the same address for two identical string literals in the same translation unit. But as you guessed, I get different ones if the same string literal content appears in different translation units.


A related interesting point: it's also permitted for different string literals to use overlapping arrays. That is, given

const char* abcdef = "abcdef";
const char* def = "def";
const char* def0gh = "def\0gh";

it's possible you might find abcdef+3, def, and def0gh are all the same pointer.

Also, this rule about reusing or overlapping string literal objects applies only to the unnamed array object directly associated with the literal, used if the literal immediately decays to a pointer or is bound to a reference to array. A literal can also be used to initialize a named array, as in

const char a1[] = "XYZ";
const char a2[] = "XYZ";
const char a3[] = "Z";

Here the array objects a1, a2 and a3 are initialized using the literal, but are considered distinct from the actual literal storage (if such storage even exists) and follow the ordinary object rules, so the storage for those arrays will not overlap.

How does concatenation of two string literals work?

It's defined by the ISO C standard, adjacent string literals are combined into a single one.

The language is a little dry (it is a standard after all) but section 6.4.5 String literals of C11 states:

In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed wide string literal tokens are concatenated into a single multibyte character sequence.

This is also mentioned in 5.1.1.2 Translation phases, point 6 of the same standard, though a little more succinctly:

Adjacent string literal tokens are concatenated.

This basically means that "abc" "def" is no different to "abcdef".

It's often useful for making long strings while still having nice formatting, something like:

const char *myString = "This is a really long "
"string and I don't want "
"to make my lines in the "
"editor too long, because "
"I'm basically anal retentive :-)";

What is the type of a string literal in C++?

The type of the string literal "Hello" is "array of 6 const char".

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string [...]

It can, however, be converted to a const char* by array-to-pointer conversion. Array-to-pointer conversion results in a pointer to the first element of the array.

Extra element in string literal array of template literals

Because there's empty string "" after the ${a - b}. The last index simply cannot be removed, because it is possible that there's a non-empty string. You get the same behavior by placing two expressions side by side without any string between them (${...}${...}). Example:

function tag(...all) {
console.log(all);
}

tag `a${1}b${2}${3}c`;

Is it safe to store string literals pointers?

Is the above code well defined?

Yes.

Are there any dark corners of standard that I have to be aware of?

Perhaps not a dark corner in the standard but one problem is that you have a pointer and you allow for Base to be instantiated and used like this:

Base foo(nullptr);
foo.print();

From operator<<:
"The behavior is undefined if s is a null pointer."

A somewhat safer constructor:

template<size_t N>
constexpr Base(const char(&name)[N]) : _name(name) {}

I say somewhat because you still can do this:

auto Foo() {
const char scoped[] = "Fragile";
Base foo(scoped);
foo.print(); // OK
return foo;
} // "scoped" goes out of scope, "_name" is now dangling

int main() {
auto f = Foo();
f.print(); // UB
}

Is there a downside to using ES6 template literals syntax without a templated expression?

The most significant reason not to use them is that ES6 is not supported in all environments.

Of course that might not affect you at all, but still: YAGNI. Don't use template literals unless you need interpolation, multiline literals, or unescaped quotes and apostrophes. Much of the arguments from When to use double or single quotes in JavaScript? carry over as well. As always, keep your code base consistent and use only one string literal style where you don't need a special one.

Why do string literals (char*) in C++ have to be constants?

Expanding on Christian Gibbons' answer a bit...

In C, string literals, like "Hello, World!", are stored in arrays of char such that they are visible over the lifetime of the program. String literals are supposed to be immutable, and some implementations will store them in a read-only memory segment (such that attempting to modify the literal's contents will trigger a runtime error). Some implementations don't, and attempting to modify the literal's contents may not trigger a runtime error (it may even appear to work as intended). The C language definition leaves the behavior "undefined" so that the compiler is free to handle the situation however it sees fit.

In C++, string literals are stored in arrays of const char, so that any attempt to modify the literal's contents will trigger a diagnostic at compile time.

As Christian points out, the const keyword was not originally a part of C. It was, however, originally part of C++, and it makes using string literals a little safer.

Remember that the const keyword does not mean "store this in read-only memory", it only means "this thing may not be the target of an assignment."

Also remember that, unless it is the operand of the sizeof or unary * operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.

In C++, when you write

const char *str = "Hello, world";

the address of the first character of the string is stored to str. You can set str to point to a different string literal:

str = "Goodbye cruel world";

but what you cannot do is modify the contents of the string, something like

str[0] = 'h';

or

strcpy( str, "Something else" );

Template string literal inside of another template string literal

How about this

const nColumn = 'columns' + this.props.data.headers.length
<div className={`${styles.dataGridHeader} ${styles[nColumn]}`}>

FYI there's an awesome library called classnames which applied to your code it looks something like this

import classNames from 'classnames'
const nColumn = 'columns' + this.props.data.headers.length
const divCls = classNames({
[`${styles.dataGridHeader}`]: true,
[`${styles[nColumn]}`]: true
})
<div className={divCls} />

Why are string literals &str instead of String in Rust?

To understand the reasoning, consider that Rust wants to be a systems programming language. In general, this means that it needs to be (among other things) (a) as efficient as possible and (b) give the programmer full control over allocations and deallocations of heap memory. One use case for Rust is for embedded programming where memory is very limited.

Therefore, Rust does not want to allocate heap memory where this is not strictly necessary. String literals are known at compile time and can be written into the ro.data section of an executable/library, so they don't consume stack or heap space.

Now, given that Rust does not want to allocate the values on the heap, it is basically forced to treat string literals as &str: Strings own their values and can be moved and dropped, but how do you drop a value that is in ro.data? You can't really do that, so &str is the perfect fit.

Furthermore, treating string literals as &str (or, more accurately &'static str) has all the advantages and none of the disadvantages. They can be used in multiple places, can be shared without worrying about using heap memory and never have to be deleted. Also, they can be converted to owned Strings at will, so having them available as String is always possible, but you only pay the cost when you need to.



Related Topics



Leave a reply



Submit