How to Construct a Std::String With an Embedded Null

How do you construct a std::string with an embedded null?

Since C++14

we have been able to create literal std::string

#include <iostream>
#include <string>

int main()
{
using namespace std::string_literals;

std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}

Before C++14

The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.

To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:

std::string   x("pq\0rs");   // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.

Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().

Also check out Doug T's answer below about using a vector<char>.

Also check out RiaD for a C++14 solution.

Can a std::string contain embedded nulls?

Yes you can have embedded nulls in your std::string.

Example:

std::string s;
s.push_back('\0');
s.push_back('a');
assert(s.length() == 2);

Note: std::string's c_str() member will always append a null character to the returned char buffer; However, std::string's data() member may or may not append a null character to the returned char buffer.

Be careful of operator+=

One thing to look out for is to not use operator+= with a char* on the RHS. It will only add up until the null character.

For example:

std::string s = "hello";
s += "\0world";
assert(s.length() == 5);

The correct way:

std::string s = "hello";
s += std::string("\0world", 6);
assert(s.length() == 11);

Storing binary data more common to use std::vector

Generally it's more common to use std::vector to store arbitrary binary data.

std::vector<char> buf;
buf.resize(1024);
char *p = &buf.front();

It is probably more common since std::string's data() and c_str() members return const pointers so the memory is not modifiable. with &buf.front() you are free to modify the contents of the buffer directly.

string with embedded null characters

Just properly initialize string with the proper size of the char array. The rest will follow naturally.

#include <sstream>
#include <string>
#include <cstring>
#include <iostream>
#include <iomanip>
int main() {
const char array[] = "125 320 512 750 333\0 xyz";

// to get the string after the null, just add strlen
const char *after_the_null_character = array + strlen(array) + 1;
std::cout << "after_the_null_character:" << after_the_null_character << std::endl;

// initialized with array and proper, actual size of the array
std::string str{array, sizeof(array) - 1};
std::istringstream ss{str};
std::string word;
while (ss >> word) {
std::cout << "size:" << word.size() << ": " << word.c_str() << " hex:";
for (auto&& i : word) {
std::cout << std::hex << std::setw(2) << std::setfill('0') << (unsigned)i;
}
std::cout << "\n";
}
}

would output:

after_the_null_character: xyz
size:3: 125 hex:313235
size:3: 320 hex:333230
size:3: 512 hex:353132
size:3: 750 hex:373530
size:4: 333 hex:33333300
size:3: xyz hex:78797a

Note the zero byte after reading 333.

C++: How do null characters work in std::string?

std::string supports embedded NUL characters*. The fact that your example code doesn't produce the expected result is, because you are constructing a std::string from a pointer to a zero-terminated string. There is no length information, and the c'tor stops at the first NUL character. s contains Hello, hence the output.

If you want to construct a std::string with an embedded NUL character, you have to use a c'tor that takes an explicit length argument:

std::string s("Hello\0, World", 13);
std::cout << s << std::endl;

produces this output:

Hello, World



* std::string maintains an explicit length member, so it doesn't need to reserve a character to act as the end-of-string sentinel.

Are std::string with null-character possible?

Contrary to what you seem to think, C++ string are not null terminated.

The difference in behavior came from the << operator overloads.

This code:

cout << a.c_str(); // a.c_str() is char*

As explained here, use the << overloads that came with cout, it print a char array C style and stop at the first null char. (the char array should be null terminated).

This code:

cout << a; // a is string

As explained here, use the << overloads that came with string, it print a string object that internally known is length and accept null char.

std::string equivalent for data with null characters?

std::string should be safe to do so... you only have to be careful using .c_str() method. Use .data().

How to assemble a string of wide characters with some null ones inserted in the middle of it?

You can push_back a null char into a std::wstring as you build it.

Example:

std::wstring str;
str += L"DSN=NiceDB";
str.push_back(L'\0');
str += L"DBQ=C:\\Users\\who\\AppData\\Local\\NiceApp\\niceDB.accdb";
str.push_back(L'\0');

You can also manually append the null char using the += operator:

std::wstring str;
str += L"DSN=NiceDB";
str += L'\0';
str += L"DBQ=C:\\Users\\who\\AppData\\Local\\NiceApp\\niceDB.accdb";
str += L'\0';

You can also just tell the append method to use +1 characters of the string literal. That will implicitly pad the std::string with the null char already in the source:

std::wstring str;
const wchar_t* header = L"DSN=NiceDB";
const wchar_t* footer = L"DBQ=C:\\Users\\who\\AppData\\Local\\NiceApp\\niceDB.accdb";

str.append(header, wcslen(header) + 1);
str.append(footer, wcslen(footer) + 1);

Then to get the pointer to the start of the final string:

LPCWSTR wcAttrs = str.c_str();

The validity of the pointer returned by .c_str() is only good for the lifetime of the backing wstring. Don't let the wstring instance go out of scope while there's still something referencing wcAttrs.

why std::string::find() can handle '\0'?

The std::string class is a C++ class that represents a string which can contain a null character. Its member functions, like find, are designed to handle those embedded nulls.

strstr (a function from C) works with char* pointers, which point to C-style strings. Because C-style strings are null-terminated, they cannot handle embedded nulls. To this effect, strstr is documented as follows:

Locate substring

Returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.

The matching process does not include the terminating null-characters, but it stops there.

The italicized part is relevant here.



Related Topics



Leave a reply



Submit