String C_Str() VS. Data()

c_str() vs. data() when it comes to return type

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:

  • Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.

  • The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.

I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.

One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.

What's the difference between std::string::c_str and std::string::data?

c_str() guarantees NUL termination. data() does not.

What actually is done when `string::c_str()` is invoked?

Since C++11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

Prior to C++11, the behavior of c_str() was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.

UPDATE

Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static nul character. There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address as the content of the string changes.

std::string::substr() returns a new std::string with its own null-terminated buffer. The string being copied from is unaffected.

c_str() vs std::string - what's the real difference in this piece of small code?


"hello" should be interpreted as either const char* or std::string both are valid.

No. "hello" is never interpreted as std::string. It has the type const char[6], which in this case is converted to const char*. This conversion for arrays is called decaying.

But why doesn't it print YES.

When you compare two pointers, you compare whether they point to the same object. The pointers that you use compare unequal because the string literal, and the buffer of the std::string are not the same object.

is this a compiler bug?

No. It is a bug in your code.

so what would you suggest as the right approach with using c_str() and std::string?

The correct way is to compare the content of null terminated character arrays is std::strcmp.

Alternatively, you could use the comparison operator with the std::string directly, without using the pointer returned by c_str. The comparison operator of std::string compares with the content of a null terminated string.

Usage of string::c_str() and data() under pre-C++11 standards

No, since the specification of C++98 clearly states what you get: An contiguous array of characters.

The internal implementation of the string storage is not necessarily reflected in method results. If the string is not stored in one part, the methods have to make sure, that you get what you want. This could mean, that the whole content is copied in a different place.

That is the reason, why you should not alter the string representation you get.

You and the person implementing the methods must both read carefully the standard describing what you get.

Difference in comparing string .c_str() and normal string

std::get<1>(*it) returns an object of type std::string. This class has overload operator == to compare objects of type std::string with character arrays.

std::get<1>(*it).c_str() returns a character array. Arrays have no the comparison operator. To compare character arrays you should use standard C function std::strcmp

So you could write

if( std::strcmp( std::get<1>(*it).c_str(), "PAUSE" ) == 0 )

If you will write simply as

if(std::get<1>(*it).c_str()=="PAUSE")  

then the compiler will compare two pointers because it converts arrays to pointers to their first elements in such expressions. And as the result this expression will be equal always to false if the arrays occupy different areas of memory.

Does a pointer returned by std::string.c_str() or std::string.data() have to be freed?

No you don't need to deallocate the ptr pointer.

ptr points to a non modifyable string located somewhere to an internal location(actually this is implementation detail of the compilers).


Reference:

C++ documentation:

const char* c_str ( ) const;

Get C string equivalent

Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.

A terminating null character is automatically appended.

The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object.

Difference between a C++ string and a C-string ( .c_str() )


What is the real difference between a C++ string and a null-terminated sequence of characters (C-string) .c_str() ?

A C++ std::string object encapsulates:

  • a char array storing the semantic (presumably textual) value
    • some implementations store short text strings directly in the std::string object
    • otherwise heap memory is typically used to store the actual string content
  • a pointer (possibly via some other control structure) to the character array
  • std::string::size_type variables recording the size and capacity of the string
  • possibly other things

In practice, the std::string's textual data - whether internally buffered or kept on the heap, is overwhelmingly likely in real-world implementations to be stored as a C-string ASCIIZ value, such that c_str() can trivially return it's address, but that's not required by the Standard. A near-worst-case (just within the boundaries of credibility) scenario is that the string has a second pointer, and c_str() copies the non-NUL-terminated string content into a newly allocated heap area that it NUL terminates. The only time this would seem beneficial is if the NUL itself tipped the string over some capacity boundary, such as from an short-string optimisation / internal buffer to heap, or from 1 page of heap memory to 2, 2 to 3, etc...

I think automatic type conversion should be do its job, that is, automatically convert a C++ string to .c_str(). Am I wrong?

Yes it can do it, but not safely (see linked possible-dupe questions).

Case 1 gives an error, and case 2 works fine. Is it possible to convert case 1 to case 2 over using static_cast<>?

static_cast<> can't convert a std::string object to a const char*... remember the string object itself has all those other things in, and typically (always for all but the smallest of strings) only has a pointer to the actual textual data.

Difference in c_str function specification between C++03 and C++11

Except for the range increment by one element since C++11, there is still a big difference between:

data()[i] == operator[](i)

and:

data() + i == &operator[](i)

That main difference is the & operator in the prototypes.

The old prototype, allowed for copy to be made when a write operation would occur, since the pointer returned could point to another buffer than the one holding the original string.

The other difference in the prototypes between data()[i] and data() + i, is not critical, since they are equivalent.


A difference between C++ and C++11 is that in the former, an std::string was not specified explicitly by the standard for whether it would have a null terminator or not. In the latter however, this is specified.

In other words: Will std::string always be null-terminated in C++11? Yes.



Related Topics



Leave a reply



Submit