Why Are Strings in C++ Usually Terminated with '\0'

Why are strings in C++ usually terminated with '\0'?

The title of your question references C strings. C++ std::string objects are handled differently than standard C strings. \0 is important when using C strings, and when I use the term string in this answer, I'm referring to standard C strings.

\0 acts as a string terminator in C. It is known as the null character, or NUL, and standard C strings are null-terminated. This terminator signals code that processes strings - standard libraries but also your own code - where the end of a string is. A good example is strlen which returns the length of a string: strlen works using the assumption that it operates on strings that are terminated using \0.

When you declare a constant string with:

const char *str = "JustAString";

then the \0 is appended automatically for you. In other cases, where you'll be managing a non-constant string as with your array example, you'll sometimes need to deal with it yourself. The docs for strncpy, which is used in your example, are a good illustration: strncpy copies over the null terminator character except in the case where the specified length is reached before the entire string is copied. Hence you'll often see strncpy combined with the possibly redundant assignment of a null terminator. strlcpy and strcpy_s were designed to address the potential problems that arise from neglecting to handle this case.

In your particular example, array[s.size()] = '\0'; is one such redundancy: since array is of size s.size() + 1, and strncpy is copying s.size() characters, the function will append the \0.

The documentation for standard C string utilities will indicate when you'll need to be careful to include such a null terminator. But read the documentation carefully: as with strncpy the details are easily overlooked, leading to potential buffer overflows.

Why/when to include terminating '\0' character for C Strings?

In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0 implicitly added to it.

Since \0's position is relied upon to find the end of string, your stringLength() function prints 5.

In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0 delimiter. The memory adjacent to the string can be anything and might have a \0 somewhere. This \0 is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0 was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.

In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0 which will be implicitly stored. Hence, it will work properly.

Case 4 is similar to case 3. No modification is done by

char stack4[5] = '\0';

because size of stack4 is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5] had \0 in it even before you overwrote it.

In case 5, you have completely filled the character array with characters without leaving space for \0. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc() merely happened to be zero which is the value of \0. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.

It should be noted that malloc() will not initialise the memory that it allocates unlike calloc().

Both

char str[2]='\0';

and

char str[2]=0;

are just the same.

But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.

If you need the default value of dynamically allocated memory to be zero, you can use calloc().

Case 6 has the \0 in the end and characters in the other positions. The proper string should be displayed when you print it.

When/Why is '\0' necessary to mark end of an (char) array?

When/Why is '\0' necessary to mark end of an (char) array?

The terminating zero is necessary if a character array contains a string. This allows to find the point where a string ends.

As for your example that as I think looks the following way

char line[100] = "hello\n";

then for starters the string literal has 7 characters. It is a string and includes the terminating zero. This string literal has type char[7]. You can imagine it like

char no_name[] = { 'h', 'e', 'l', 'l', 'o', '\n', '\0' };

When a string literal is used to initialize a character array then all its characters are used as initializers. So relative to the example the seven characters of the string literal are used to initialize first 7 elements of the array. All other elements of the array that were not initialized by the characters of the string literal will be initialized implicitly by zeroes.

If you want to determine how long is the string stored in a character array you can use the standard C function strlen declared in the header <string.h>. It returns the number of characters in an array before the terminating zero.

Consider the following example

#include <stdio.h>
#include <string.h>

int main(void)
{
char line[100] = "hello\n";

printf( "The size of the array is %zu"
"\nand the length of the stored string \n%s is %zu\n",
sizeof( line ), line, strlen( line ) );

return 0;
}

Its output is

The size of the array is 100
and the length of the stored string
hello
is 6

In C you may use a string literal to initialize a character array excluding the terminating zero of the string literal. For example

char line[6] = "hello\n";

In this case you may not say that the array contains a string because the sequence of symbols stored in the array does not have the terminating zero.

What happened when we do not include '\0' at the end of string in C?

Since %s format specifier expects a null-terminated string, the resulting behavior of your code is undefined. Your program is considered ill-formed, and can produce any output at all, produce no output, crash, and so on. To put this shortly, don't do that.

This is not to say that all arrays of characters must be null-terminated: the rule applies only to arrays of characters intended to use as C strings, e.g. to be passed to printf on %s format specifier, or to be passed to strlen or other string functions of the Standard C library.

If you are intended to use your char array for something else, it does not need to be null terminated. For example, this use is fully defined:

char full_name[] = {
't', 'o', 'a', 'n'
};
for (size_t i = 0 ; i != sizeof(full_name) ; i++) {
printf("%c", full_name[i]);
}

String termination - char c=0 vs char c='\0'

http://en.wikipedia.org/wiki/Ascii#ASCII_control_code_chart

Binary   Oct  Dec    Hex    Abbr    Unicode  Control char  C Escape code   Name
0000000 000 0 00 NUL ␀ ^@ \0 Null character

There's no difference, but the more idiomatic one is '\0'.

Putting it down as char c = 0; could mean that you intend to use it as a number (e.g. a counter). '\0' is unambiguous.

Are C constant character strings always null terminated?

A string is only a string if it contains a null character.

A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1

"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.

"def\0ghi"  // 2 null characters.

In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.

char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";

With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.

const char* s = "abc";

Deeper: C does not define "constant character strings".

The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.

The standard C library defines string.

With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.

Why does C need a null zero as a string terminator while Java doesn't?

in C there is no type string, there is only a pointer to a char. When in C you need a string, you need to know how many characters are in the string, or have an indicator to see that you have reached the end of the string.

Traditionally there are two approaches to this requirements. In the C world the convention is to terminate the string with the \0 character. In the PASCAL world the convention is to use another variable to store the length of the string.

Java uses the PASCAL convention and stores the length of the string in another variable as the content of the string.

Both approaches have their merits. In the Java/PASCAL world, it is easy to know the length of the strings and a string can contain the \0 character. In C you can reuse the same character array for tail substrings etc.

Null terminated string in C

Is it absolutely necessary? No, because when you call scanf, strcpy(except for strncpy where you need to manually put zero if it exceeds the size), it copies the null terminator for you. Is it good to do it anyways? Not really, it doesn't really help the problem of bufferoverflow since those function will go over the size of the buffer anyways. Then what's the best way? use c++ with std::string.

By the way, if you access/write to string1[257], that will be out of bound since you're accessing/writing 258th element in an array of size 257. (it's 0-based index)



Related Topics



Leave a reply



Submit