Why Is Modifying a String Through a Retrieved Pointer to Its Data Not Allowed

Why is modifying a string through a retrieved pointer to its data not allowed?

Why can't we write directly to this buffer?

I'll state the obvious point: because it's const. And casting away a const value and then modifying that data is... rude.

Now, why is it const? That goes back to the days when copy-on-write was considered a good idea, so std::basic_string had to allow implementations to support it. It would be very useful to get an immutable pointer to the string (for passing to C-APIs, for example) without incurring the overhead of a copy. So c_str needed to return a const pointer.

As for why it's still const? Well... that goes to an oddball thing in the standard: the null terminator.

This is legitimate code:

std::string stupid;
const char *pointless = stupid.c_str();

pointless must be a NUL-terminated string. Specifically, it must be a pointer to a NUL character. So where does the NUL character come from? There are a couple of ways for a std::string implementation to allow this to work:

  1. Use small-string optimization, which is a common technique. In this scheme, every std::string implementation has an internal buffer it can use for a single NUL character.
  2. Return a pointer to static memory, containing a NUL character. Therefore, every std::string implementation will return the same pointer if it's an empty string.

Everyone shouldn't be forced to implement SSO. So the standards committee needed a way to keep #2 on the table. And part of that is giving you a const string from c_str(). And since this memory is likely real const, not fake "Please don't modify this memory const," giving you a mutable pointer to it is a bad idea.

Of course, you can still get such a pointer by doing &str[0], but the standard is very clear that modifying the NUL terminator is a bad idea.

Now, that being said, it is perfectly valid to modify the &str[0] pointer, and the array of characters therein. So long as you stay in the half-open range [0, str.size()). You just can't do it through the pointer returned by data or c_str. Yes, even though the standard in fact requires str.c_str() == &str[0] to be true.

That's standardese for you.

Using &front() to modify underlying character array in std::string

According to my read of C++11:

const charT& front() const;

charT& front();

Requires: !empty()

Effects: Equivalent to operator[](0).

Let's look at operator[]:

const_reference operator[](size_type pos) const;

reference operator[](size_type pos);

1 Requires: pos <= size().

2 Returns: *(begin() + pos) if pos < size(), otherwise a reference to
an object of type T with valuecharT(); the referenced value shall not be modified.

Since you preallocate your buffer, pos will be less than size(). begin() returns an iterator, so all the usual iterator traits apply.

So, according to my interpretation of C++11, this approach should be legal.

Why is the const overload of std::string::data still restricted in modern C++?

I'm not talking about a const std::string, but a non-const one.

And that right there is why that statement exists (and continues to exist even in C++17, when a non-const data was added). Because data doesn't know that.

In a small-string optimized string implementation, the string object itself stores an array of characters. If that string object is declared const, then so too are its subobjects. Modifying objects declared as const is UB.

By contrast, vector::data has no such statement, because a const vector always heap-allocates its array. So while the array is logically const from the outside, it is technically well-defined (but you really, really shouldn't) to const_cast the return value from a const vector::data, because you're modifying an object that was not created as const.

If basic_string::data had no such statement, an SSO-based implementation would be impossible, because it would be legal to modify the elements of a const string, just like it's legal to modify the elements of a const vector. But it can't be legal to modify it, because it might be a const object whose data is stored internally.

Do c_str()'s requirements make modification illegal?

The particular phrasing is a relic from the C++03-era specification that permitted copy-on-write strings. At some point in the past the spec for c_str() read:

Returns: A pointer to the initial element of an array of length
size() + 1 whose first size() elements equal the corresponding
elements of the string controlled by *this and whose last element is
a null character specified by charT().

Requires: The program shall not alter any of the values stored in the array. Nor shall the program treat the returned value as a valid
pointer value after any subsequent call to a non-const member
function of the class basic_string that designates the same object as
this.

in which context the requirement made a lot more sense. If c_str() returned a pointer to a string shared between different std::strings, modifying the values in the array would be really bad.

In C++14, this prohibition makes very little sense. Reading it as prohibiting modifying the string at all after a c_str() call won't make much sense, as you pointed out; reading it as prohibiting modifying the string through the returned pointer would make slightly more sense, but not much. There's no real reason why the semantics should be different between the pointer returned by c_str() and the pointer obtained using &operator[](0).

Why pointer is not const in the method argument?

I thought having the const pointer as const char should be sufficient but it is not

A pointer to const means that indirecting through the pointer yields a const lvalue (reference). Having a const lvalue simply means that you cannot modify the object through the lvalue.

Thus, having a pointer to const merely means that you cannot modify the object through the pointer:

bool query_nonblocking(const char * const query, 
unsigned long length) {
query[0] = 'x'; // ill-formed, because of pointer to const

Pointer to const does not mean that the pointed object is necessarily const. Nor does it mean that the pointed data cannot change.


Having a const pointer simply means that you cannot make the pointer point to somewhere else:

bool query_nonblocking(const char * const query, 
unsigned long length) {
query = "another string"; // ill-formed, because of const pointer

1 Const argument does not mean that every time the function is called, it would be given the same address.



I expected the following code to fail to compile

You violate neither constness as described above, so there is no reason to expect this.

because the str() returns string by value that will keep on changing on every call

See specifically explanation above 1.



(1) Why does the pointer is not constant in the method argument after compilation.

query_nonblocking(char const*, unsigned long):

Because const and non-const arguments are treated the same by the language for overload resolution purposes. The constness of the argument is indistinguishable to the outside of the function definition, so as long as the compiler has checked that the function definition does not violate the constness, it no longer needs to care about the constness.

Mutation of a mutable data-member via pointer-to-member

Just before that note you will find the following:

The restrictions on cv-qualification, and the manner in which the cv-qualifiers of the operands are combined to produce the cv-qualifiers of the result, are the same as the rules for E1.E2 given in 5.2.5

Jump up to 5.2.5 and you'll find this:

If E2 is a non-static data member and the type of E1 is “cq1 vq1 X”, and the type of E2 is “cq2 vq2 T”, the expression designates the named member of the object designated by the first expression. If E1 is an lvalue, then E1.E2 is an lvalue; otherwise E1.E2 is an xvalue. Let the notation vq12 stand for the “union” of vq1 and vq2; that is, if vq1 or vq2 is volatile, then vq12 is volatile. Similarly, let the notation cq12 stand for the “union” of cq1 and cq2; that is, if cq1 or cq2 is const, then cq12 is const. If E2 is declared to be a mutable member, then the type of E1.E2 is “vq12 T”. If E2 is not declared to be a mutable member, then the type of E1.E2 is “cq12 vq12 T”.

The union of the const qualifiers in cs.*pm is const, the exception for mutable members doesn't apply to pointers.

It's easier to understand if you consider that storage class specifiers are not part of the type, so how would the compiler be able to distinguish between mutable and non-mutable pointers to member?

struct S;

void f(const S& s, int S::* pm)
{
s.*pm = 1; // How do I know if pm points to a mutable member? S isn't even defined!
}

Simply put, there is no such thing as a pointer to a mutable member, just as there is no such thing as a pointer to a static member, the pointed type's storage class is unknown (the type can only be qualified by const and / or volatile).

c_str() vs. data() when it comes to return type

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:

  • Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.

  • The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.

I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.

One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.



Related Topics



Leave a reply



Submit