Is It Legal to Modify the Result of Std::String::Op[]

Is it legal to modify the result of std::string::op[]?

The quote means that you cannot modify the return of operator[]( size() ), even if the value is well defined. That is, you must not modify the NUL terminator in the string even through the non-const overload.

This is basically your first option: i.e. pos >= size(), but because of the requirement pos <= size() the only possible value for that condition is pos == size().

The actual English description of the clause can be ambiguous (at least to me), but Appendix C, and in particular C.2.11 deals with changes in semantics in the string library, and there is no mention to this change --that would break user code. In C++03 the "referenced value shall not be modified" bit is not present and there is no ambiguity. The lack of mention in C.2.11 is not normative, but can be used as a hint that when they wrote the standard there was no intention on changing this particular behavior.

Effects of modifying std::string using op[] beyond its size?

Although you already got a correct comment saying the behaviour is undefined, there is something worthy of an actual answer too.

A C++ string object can contain any sequence of characters you like. A C-style string is terminated by the first '\0'. Consequently, a C++ string object must store the size somewhere other than by searching for the '\0': it may contain embedded '\0' characters.

#include <string>
#include <iostream>

int main() {
std::string s = "abc";
s += '\0';
s += "def";
std::cout << s << std::endl;
std::cout << s.c_str() << std::endl;
}

Running this, and piping the output through cat -v to make control characters visible, I see:


abc^@def
abc

This explains what you're seeing: you're overwriting the '\0' terminator, but you're not overwriting the size, which is stored separately.

As pointed out by kec, you might have seen garbage except you were lucky enough to have an additional zero byte after your extra characters.

Is it safe to pass std::string to C style APIs?

If the C API function requires read-only access to the contents of the std::string then use the std::string::c_str() member function to pass the string. This is guaranteed to be a null terminated string.

If you intend to use the std::string as an out parameter, C++03 doesn't guarantee that the stored string is contiguous in memory, but C++11 does. With the latter it is OK to modify the string via operator[] as long as you don't modify the terminating NUL character.

Can you avoid using temporary buffers when using std::string to interact with C style APIs?

In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.

Previously, you can use .data() or .c_str() but the string is not mutable through these.

Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.

Is it safe to ever cast the result of string's c_str to a char*?

Yes, it's safe as long as the function you're passing it to does not attempt to modify the contents of the string.

You can even avoid the const_cast using

c_api_lib_func(&str[0]);

Note that this is technically not safe with a pre-C++11 compiler because std::string was not required to have contiguous storage for it's internal buffer.

Using &str[0], the function may even modify the contents of the string's internal buffer as long as it leaves the terminating NULL character alone.

How to replace all occurrences of a character in string?

std::string doesn't contain such function but you could use stand-alone replace function from algorithm header.

#include <algorithm>
#include <string>

void some_func() {
std::string s = "example string";
std::replace( s.begin(), s.end(), 'x', 'y'); // replace all 'x' to 'y'
}

Why std::string object when constructed with default constructor behaves differently?

You have a string of length 0 and then you try to modify its contents using the subscript operator. That's undefined behavior, so at this point, no particular outcome is guaranteed. If you used at() instead, it would have exposed the mistake and thrown an exception instead.

why the length is returning as 0

It started out as 0 and you didn't do anything to add to it (such as push_back or +=). But then again, since what you did earlier was undefined behavior, anything could have happened here.

In addition, I didn't get any kind of exception.

You can try std::string::at instead, which will throw an std::out_of_range exception when you try that.

Are the days of passing const std::string & as a parameter over?

The reason Herb said what he said is because of cases like this.

Let's say I have function A which calls function B, which calls function C. And A passes a string through B and into C. A does not know or care about C; all A knows about is B. That is, C is an implementation detail of B.

Let's say that A is defined as follows:

void A()
{
B("value");
}

If B and C take the string by const&, then it looks something like this:

void B(const std::string &str)
{
C(str);
}

void C(const std::string &str)
{
//Do something with `str`. Does not store it.
}

All well and good. You're just passing pointers around, no copying, no moving, everyone's happy. C takes a const& because it doesn't store the string. It simply uses it.

Now, I want to make one simple change: C needs to store the string somewhere.

void C(const std::string &str)
{
//Do something with `str`.
m_str = str;
}

Hello, copy constructor and potential memory allocation (ignore the Short String Optimization (SSO)). C++11's move semantics are supposed to make it possible to remove needless copy-constructing, right? And A passes a temporary; there's no reason why C should have to copy the data. It should just abscond with what was given to it.

Except it can't. Because it takes a const&.

If I change C to take its parameter by value, that just causes B to do the copy into that parameter; I gain nothing.

So if I had just passed str by value through all of the functions, relying on std::move to shuffle the data around, we wouldn't have this problem. If someone wants to hold on to it, they can. If they don't, oh well.

Is it more expensive? Yes; moving into a value is more expensive than using references. Is it less expensive than the copy? Not for small strings with SSO. Is it worth doing?

It depends on your use case. How much do you hate memory allocations?

Function for both C-style strings and c++ std::string

As long as you know how big the output buffer needs to be you can create a std::string and resize it to the buffer size. You can then pass a pointer to the std::string buffer into the C-style overload.

#include <cstring>
#include <iostream>
#include <string>

void TransformString(const char *in_c_string, char *out_c_string) {
size_t length = strlen(in_c_string);

for (size_t i = 0; i < length; ++i)
out_c_string[i] = '*';

out_c_string[length] = 'a';
out_c_string[length+1] = 'b';
out_c_string[length+2] = 'c';
out_c_string[length+3] = '\0';
}

std::string TransformString(const std::string &in_string) {
std::string out;
out.resize(100);

TransformString(in_string.c_str(), &out[0]);
out.resize(strlen(&out[0]));

// IIRC there are some C++11 rule that allows 'out' to
// be automatically moved here (if it isn't RVO'd)
return out;
}

int main() {
std::string string_out = TransformString("hello world");

char charstar_out[100];
TransformString("hello world", charstar_out);

std::cout << string_out << "\n";
std::cout << charstar_out << "\n";

return 0;
}

Here is a live example: http://ideone.com/xwVWCh.



Related Topics



Leave a reply



Submit