Why Does Gcc Allow Char Array Initialization with String Literal Larger Than Array

Why does gcc allow char array initialization with string literal larger than array?

Initializing a char array with a string literal that is larger than it is fine in C, but wrong in C++. That explains the difference in behavior between gcc and VC++.

You would get no error if you compiled the same as a C file with VC++. And you would get an error if you compiled it as a C++ file with g++.

The C standard says:

An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.

[...]

EXAMPLE 8

The declaration

char s[] = "abc", t[3] = "abc";

defines ‘‘plain’’ char array objects s and t whose elements are initialized
with character string literals.
This declaration is identical to

char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

(Section 6.7.9 of the C11 draft standard, actual wording in final standard might be different.)

This means that it's perfectly correct to drop the termination character if the array doesn't have room for it. It's maybe unexpected, but it's exactly how the language is supposed to work, and a (at least to me) well-known feature.

On the contrary, the C++ standard says:

There shall not be more initializers than there are array elements.

Example:

 char cv[4] = "asdf"; // error

is ill-formed since there is no space for the implied trailing '\0'.

(8.5.2 of the C++ 2011 draft n3242.)

Why initializer-string for array of chars is too long compiles fine in C & not in C++?

Short answer: Because C and C++ are different languages with different rules.

Long answer: In both cases the reason is that the array is too small for the string literal. The literal consists of the five visible characters, with a zero terminator on the end, so the total size is 6.

In C, you're allowed to initialise an array with a string that's too long; extra characters are simply ignored:

C99 6.7.8/14: An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

The compiler helpfully warns that the string is too large, since it almost certainly indicates an error; but it can't reject the code unless you tell it to treat warnings as errors.

In C++, the initialiser isn't allowed to be larger than the array:

C++11 8.5.2/2: There shall not be more initializers than there are array elements.

so, for that language, the compiler should give an error.

In both languages, when you want a character array to be the right size for a string literal initialiser, you can leave the size out and the compiler will do the right thing.

char a[] = "hello";  // size deduced to be 6

String literals vs array of char when initializing a pointer

I think you're confused because char *p = "ab"; and char p[] = "ab"; have similar semantics, but different meanings.

I believe that the latter case (char p[] = "ab";) is best regarded as a short-hand notation for char p[] = {'a', 'b', '\0'}; (initializes an array with the size determined by the initializer). Actually, in this case, you could say "ab" is not really used as a string literal.

However, the former case (char *p = "ab";) is different in that it simply initializes the pointer p to point to the first element of the read-only string literal "ab".

I hope you see the difference. While char p[] = "ab"; is representable as an initialization such as you described, char *p = "ab"; is not, as pointers are, well, not arrays, and initializing them with an array initializer does something entirely different (namely give them the value of the first element, 0x61 in your case).

Long story short, C compilers only "replace" a string literal with a char array initializer if it is suitable to do so, i.e. it is being used to initialize a char array.

Why is it ok to use a string literal to initialize an unsigned char array but not to initialize an unsigned char pointer?

Technically, because the standard explicitly allows arrays of a (any) character type to be initializable with string literals (6.7.9p14):

An array of character type may be initialized by a character string
literal or UTF-8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.

while for most pointers conversions, the standard requires explicit casts (6.5.4p3):

Conversions that involve pointers, other than where permitted by the
constraints of 6.5.16.1, shall be specified by means of an explicit
cast.

Intuitively, because you can do:

unsigned char a0 = 'f', a1 = 'o', a2 = 'o';

or in other words, because you can initialize an integer type with a different integer type without having to cast explicitly.

Does array initialization with a string literal cause two memory storage?

Yes, that's right.

Note that the object on the right of the = is not a string literal as such; it's an initialisation expression and the compiler is under no obligation to store it as though it were a string. It might break it up into pieces, or emit a series of immediate stores instead of copying the initial value, or even (in theory) emit code which decompresses a shortened version of the initial data. But however it is compiled, some memory will be occupied.

If it the compiler does choose to store the initial value as a string, that value will need to be immutable so it may be placed in read-only memory and/or be shared with a string literal with the same value. a, on the other hand, is mutable and may be altered. So clearly it must have its own memory.

Finally, there is one case in which the compiler might not reserve ant space at all. It might optimise away either or both the unused array and the initial value if it can prove that no visible changes will result from the removal, as would be possible with the sample code in your question.

When a fixed-length char array is initialized with a short string, how is the remaining space initialized?

The general rule for any array (or struct) where not all members are initialized explicitly, is that the remaining ones are initialized "as if they had static storage duration". Which means that they are set to zero.

So it will actually work just fine to write something weird like this: char s[10] = {'T','e','s','t'};. Since the remaining bytes are set to zero and the first of them will be treated as the null terminator.

Initializing std::arraychar,x member in constructor using string literal. GCC bug?

Yes, your code is valid; this is a bug in gcc.

Here's a simpler program that demonstrates the bug (I've replaced std::array<char, 4> with S and got rid of A, as we can demonstrate the bug just in function return (this makes the analysis simpler, as we don't have to worry about constructor overloading):

struct S { char c[4]; };
S f() { return {"xxx"}; }

Here we have a destination object of type S that is copy-initialized (8.5p15) from the braced-init-list {"xxx"}, so the object is list-initialized (8.5p17b1). S is an aggregate (8.5.1p1) so aggregate initialization is performed (8.5.4p3b1). In aggregate initialization, the member c is copy-initialized from the corresponding initializer-clause "xxx" (8.5.1p2). We now return to 8.5p17 with destination object of type char[4] and initializer the string literal "xxx", so 8.5p17b3 refers us to 8.5.2 and the elements of the char array are initialized by the successive characters of the string (8.5.2p1).

Note that gcc is fine with the copy-initialization S s = {"xxx"}; while breaking on various forms of copy- and direct-initialization; argument passing (including to constructors), function return, and base- and member-initialization:

struct S { char c[4]; };
S f() { return {"xxx"}; }
void g(S) { g({"xxx"}); }
auto p = new S({"xxx"});
struct T { S s; T(): s({"xxx"}) {} };
struct U: S { U(): S({"xxx"}) {} };
S s({"xxx"});

The last is particularly interesting as it indicates that this may be related to bug 43453.



Related Topics



Leave a reply



Submit