What Is a Null-Terminated String

What is a null-terminated string?

A null-terminated string is a contiguous sequence of characters, the last one of which has the binary bit pattern all zeros. I'm not sure what you mean by a "usual string", but if you mean std::string, then a std::string is not required (until C++11) to be contiguous, and is not required to have a terminator. Also, a std::string's string data is always allocated and managed by the std::string object that contains it; for a null-terminated string, there is no such container, and you typically refer to and manage such strings using bare pointers.

All of this should really be covered in any decent C++ text book - I recommend getting hold of Accelerated C++, one of the best of them.

What is the different between a null terminated string and a string that is not terminated by null in x86 assembly language

There's nothing specific to asm here; it's the same issue in C. It's all about how you store strings in memory and keep track of where they end.

what is the different between a null terminated string and a string that is not terminated by null?

A null-terminated string has a 0 byte after it, so you can find the end with strlen. (e.g. with a slow repne scasb). This makes is usable as an implicit-length string, like C uses.

NASM Assembly - what is the ", 0" after this variable for? explains the NASM syntax for creating one in static storage with db. db usage in nasm, try to store and print string shows what happens when you forget the 0 terminator.

Are they interchangeable?

If you know the length of a null-terminated string, you can pass pointer+length to a function that wants an explicit-length string. That function will never look at the 0 byte, because you will pass a length that doesn't include the 0 byte. It's not part of the string data proper.

But if you have a string without a terminator, you can't pass it to a function or system-call that wants a null-terminated string. (If the memory is writeable, you could store a 0 after the string to make it into a null-terminated string.)


In Linux, many system calls take strings as C-style implicit-length null-terminated strings. (i.e. just a char* without passing a length).

For example, open(2) takes a string for the path: int open(const char *pathname, int flags); You must pass a null-terminated string to the system call. It's impossible to create a file with a name that includes a '\0' in Linux (same as most other Unix systems), because all the system calls for dealing with files use null-terminated strings.

OTOH, write(2) takes a memory buffer which isn't necessarily a string. It has the signature ssize_t write(int fd, const void *buf, size_t count);. It doesn't care if there's a 0 at buf+count because it only looks at the bytes from buf to buf+count-1.

You can pass a string to write(). It doesn't care. It's basically just a memcpy into the kernel's pagecache (or into a pipe buffer or whatever for non-regular files). But like I said, you can't pass an arbitrary non-terminated buffer as the path arg to open().

Or they are not equivalent of each other?

Implicit-length and explicit-length are the two major ways of keeping track of string data/constants in memory and passing them around. They solve the same problem, but in opposite ways.

Long implicit-length strings are a bad choice if you sometimes need to find their length before walking through them. Looping through a string is a lot slower than just reading an integer. Finding the length of an implicit-length string is O(n), but an explicit-length string is of course O(1) time to find the length. (It's already known!). At least the length in bytes is known, but the length in Unicode characters might not be known, if it's in a variable-length encoding like UTF-8 or UTF-16.

What's the rationale for null terminated strings?

From the horse's mouth

None of BCPL, B, or C supports
character data strongly in the
language; each treats strings much
like vectors of integers and
supplements general rules by a few
conventions. In both BCPL and B a
string literal denotes the address of
a static area initialized with the
characters of the string, packed into
cells. In BCPL, the first packed byte
contains the number of characters in
the string; in B, there is no count
and strings are terminated by a
special character, which B spelled
*e. This change was made partially
to avoid the limitation on the length
of a string caused by holding the
count in an 8- or 9-bit slot, and
partly because maintaining the count
seemed, in our experience, less
convenient than using a terminator.

Dennis M Ritchie, Development of the C Language

Why do we need a null terminator in C++ strings?

The other characters in the array char john[100] = "John" would be filled with zeros, which are all null-terminators. In general, when you initialize an array and don't provide enough elements to fill it up, the remaining elements are default-initialized:

int foo[3] {5};           // this is {5, 0, 0}
int bar[3] {}; // this is {0, 0, 0}

char john[5] = "John"; // this is {'J', 'o', 'h', 'n', 0}
char peter[5] = "Peter"; // ERROR, initializer string too long
// (one null-terminator is mandatory)

Also see cppreference on Array initialization. To find the length of such a string, we just loop through the characters until we find 0 and exit.

The motivation behind null-terminating strings in C++ is to ensure compatibility with C-libraries, which use null-terminated strings. Also see What's the rationale for null terminated strings?

Containers like std::string don't require the string to be null-terminated and can even store a string containing null-characters. This is because they store the size of the string separately. However, the characters of a std::string are often null-terminated anyways so that std::string::c_str() doesn't require a modification of the underlying array.

C++-only libraries will rarely -if ever- pass C-strings between functions.

What are null-terminated strings?

What are null-terminating strings?

In C, a "null-terminated string" is a tautology. A string is, by definition, a contiguous null-terminated sequence of characters (an array, or a part of an array). Other languages may address strings differently. I am only discussing C strings.

How are they different from a non-null-terminated strings?

There are no non-null-terminated strings in C. A non-null-terminated array of characters is just an array of characters.

What is this null that terminates the string? Is it different from NULL?
The "null character" is a character with the integer value of zero. (Characters are, in essence, small integers). It is sometimes, especially in the context of ASCII, referred to as NUL (single L). This is distinct from NULL (double L), which is a null pointer. The null character can be written as '\0' or just 0 in the source code. The two forms are interchangeable in C (but not in C++). The former is usually preferred because it shows the intent better.

Should I null-terminate my strings myself, or the compiler will do it for me?

If you are writing a string literal, you don't need to explicitly insert a null character in the end. The compiler will do it.

char* str1 = "a string";   // ok, \0 is inserted automatically
char* str2 = "a string\0"; // extra \0 is not needed

The compiler will not insert a null character when declaring an array with an explicit size and initialising it with a string literal with more characters than the array can hold.

char str3[5] = "hello"; // not enough space in the array for the null terminator
char str4[] = "hello"; // ok, there is \0 in the end, the total size is 6

The compiler will not insert a null character when declaring an array and not initialising it with a string literal.

char str5[] = { 'h', 'e', 'l', 'l', 'o' };       // no null terminator
char str6[] = { 'h', 'e', 'l', 'l', 'o', '\0' }; // null terminator

If you are building a string at run-time out of some data that comes from IO or from a different part of the program, you need to make sure a null terminator is inserted. Standard library functions such as fread and POSIX functions such as read never null-terminate their arguments. strncpy will add a null-terminator if there is enough space for it, so use it with care. Confusingly, strncat will always add a null-terminator.

Why are null-terminated strings needed?

Many functions from the standard C library, and many functions from third-party libraries, operate on strings (and all strings need to be null-terminated). If you pass a non-null-terminated character array to a function that expects a string, the results are likely to be undefined. So if you want to interoperate with the world around you, you need null-terminated strings. If you never use any standard-library or third-party functions that expect string arguments, you may do what you want.

How do I set up my code/data to handle null-terminated strings?

If you plan to store strings of length up to N, allocate N+1 characters for your data. The character needed for the null terminator is not included the length of the string, but it is included in the size of the array required to store it.

Null termination of char array

If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.

C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.

This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():

fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.

Example:

if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}

Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.

size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}

Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.

null terminating a string

To your first question:
I would go with Paul R's comment and terminate with '\0'. But the value 0 itself works also fine. A matter of taste. But don't use the MACRO NULLwhich is meant for pointers.

To your second question:
If your string is not terminated with\0, it might still print the expected output because following your string is a non-printable character in your memory. This is a really nasty bug though, since it might blow up when you might not expect it. Always terminate a string with '\0'.

Null-terminate string: Use '\0' or just 0?

Use '\0' or just 0?

There is no difference in value.

In C, there is no difference in type: both are int.

In C++ they have different types: char and int. So the edge goes to '\0' as there is no type conversion involved.

Different style guides promote one over the other. '\0' for clarity. 0 for lack of clutter.

Right answer: Use the style based on your group's coding standards/guidelines. If you group does not have such guideline, make one. Better to use one than have divergent styles.



Related Topics



Leave a reply



Submit