Writing Directly to Std::String Internal Buffers

writing directly to std::string internal buffers

I'm not sure the standard guarantees that the data in a std::string is stored as a char*. The most portable way I can think of is to use a std::vector, which is guaranteed to store its data in a continuous chunk of memory:

std::vector<char> buffer(100);
FunctionInDLL(&buffer[0], buffer.size());
std::string stringToFillIn(&buffer[0]);

This will of course require the data to be copied twice, which is a bit inefficient.

Directly write into char* buffer of std::string

C++98/03

Impossible. String can be copy on write so it needs to handle all reads and writes.

C++11/14

In [string.require]:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string
object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

So &str.front() and &str[0] should work.

C++17

str.data(), &str.front() and &str[0] work.

Here it says:

charT* data() noexcept;

Returns: A pointer p such that p + i == &operator[](i) for each i in [0, size()].

Complexity: Constant time.

Requires: The program shall not alter the value stored at p + size().

The non-const .data() just works.

The recent draft has the following wording for .front():

const charT& front() const;

charT& front();

Requires: !empty().

Effects: Equivalent to operator[](0).

And the following for operator[]:

const_reference operator[](size_type pos) const;

reference operator[](size_type pos);

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

Throws: Nothing.

Complexity: Constant time.

So it uses iterator arithmetic. so we need to inspect the information about iterators. Here it says:

3 A basic_string is a contiguous container ([container.requirements.general]).

So we need to go here:

A contiguous container is a container that supports random access iterators ([random.access.iterators]) and whose member types iterator and const_iterator are contiguous iterators ([iterator.requirements.general]).

Then here:

Iterators that further satisfy the requirement that, for integral values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous iterators.

Apparently, contiguous iterators are a C++17 feature which was added in these papers.

The requirement can be rewritten as:

assert(*(a + n) == *(&*a + n));

So, in the second part we dereference iterator, then take address of the value it points to, then do a pointer arithmetic on it, dereference it and it's the same as incrementing an iterator and then dereferencing it. This means that contiguous iterator points to the memory where each value stored right after the other, hence contiguous. Since functions that take char* expect contiguous memory, you can pass the result of &str.front() or &str[0] to these functions.

Are there downsides to using std::string as a buffer?

Don't use std::string as a buffer.

It is bad practice to use std::string as a buffer, for several reasons (listed in no particular order):

  • std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
  • As a concrete example: Before C++17, you can't even write through the pointer you get with data() - it's const Tchar *; so your code would cause undefined behavior. (But &(str[0]), &(str.front()), or &(*(str.begin())) would work.)
  • Using std::strings for buffers is confusing to readers of your function's definition, who assume you would be using std::string for, well, strings. In other words, doing so breaks the Principle of Least Astonishment.
  • Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
  • std::unique_ptr would be fine for your case, or even std::vector. In C++17, you can use std::byte for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost's small_vector (thank you, @gast128, for mentioning it).
  • (Minor point:) libstdc++ had to change its ABI for std::string to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.

Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()ing. But that in itself is not really a reason to avoid std::string, since you can avoid the double allocation using the construction in @Jarod42's answer.

Is there a way to get std:string's buffer

Use std::vector<char> if you want a real buffer.

#include <vector>
#include <string>

int main(){
std::vector<char> buff(MAX_PATH+1);
::GetCurrentDirectory(MAX_PATH+1, &buff[0]);
std::string path(buff.begin(), buff.end());
}

Example on Ideone.

Is it permitted to modify the internal std::string buffer returned by operator[] in C++11

operator[]

operator[] returns a reference to the character. So if the string is NOT const, you can modify it safely.

For C++ 11, the characters are stored contiguously, so you can take &str[0] as the beginning of the underlying array whose size is str.size(). And you can modify any element between [ &str[0], &str[0] + str.size() ), if the string is NOT const. e.g. you can pass &str[0] and str.size() to void func(char *arr, size_t arr_size): func(&str[0], str.size())

data() and c_str() members

For C++11 and C++14, both data() and c_str() returns const CharT*, so you CANNOT modify element with the returned pointer. However, from C++17, data() will return CharT*, if string is NOT const. And data() will be an alias to &str[0].

Can you avoid using temporary buffers when using std::string to interact with C style APIs?

In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.

Previously, you can use .data() or .c_str() but the string is not mutable through these.

Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.

is there a way to set the length of a std::string without modifying the buffer content?

You should be using resize() not reserve(), then resize() again to set the final length.

Otherwise when you resize() from zero to the result returned by strlen() the array will be filled with zero characters, overwriting what you wrote into it. The string is allowed to do that, because it (correctly) assumes that everything from the current size to the current reserved capacity is uninitialized data that doesn't contain anything.

In order for the string to know that the characters are actually valid and their contents should be preserved, you need to use resize() initially, not reserve(). Then when you resize() again to make the string smaller it only truncates the unwanted end of the string and adds a null terminator, it won't overwrite what you wrote into it.

N.B. the initial resize() will zero-fill the string, which is not strictly necessary in your case because you're going to overwrite the portion you care about and then discard the rest anyway. If the strings are very long and profiling shows the zero-filling is a problem then you could do this instead:

std::unique_ptr<char[]> str(new char[SOME_MAX_VALUE]);
some_C_API_func(str.get());

How do I perform string formatting to a static buffer in C++?

My thanks to all that posted suggestions (even in the comments).

I appreciate the suggestion by SJHowe, being the briefest solution to the problem, but one of the things I am looking to do with this attempt is to start coding for the C++ of the future, and not use anything deprecated.

The solution I decided to go with stems from the comment by Remy Lebeau:

#include <iostream>  // For std::ostream and std::streambuf
#include <cstring> // For std::memset

template <int bufferSize>
class FixedBuffer : public std::streambuf
{
public:
FixedBuffer()
: std::streambuf()
{
std::memset(buffer, 0, sizeof(buffer));
setp(buffer, &buffer[bufferSize-1]); // Remember the -1 to preserve the terminator.
setg(buffer, buffer, &buffer[bufferSize-1]); // Technically not necessary for an std::ostream.
}

std::string get() const
{
return buffer;
}

private:
char buffer[bufferSize];
};

//...

constexpr int BUFFER_SIZE = 200;
FixedBuffer<BUFFER_SIZE> buffer;
std::ostream ostr(&buffer);

ostr << "PartA: " << intA << std::endl << "PartB: " << intB << std::endl << std::ends;

Use an external buffer for a string without copying

Yes, copying is always happening. BTW, you don't need to wrap std::string(buffer) as the constructor std::string(char const*) is implicit and a simple

foo(buffer);

will implicitly copy the buffer into the string. If you are the author of foo you can add an overload

void foo(char const*)

that avoids the copying. However, C strings are suffering from the problem that the null terminator is part of the string APIs, and so you can't easily create substrings without mutating the underlying string (a la strtok).

The Library Fundamentals Technical Specification contains a string_view class that will eliminate the copying like char const*, but preserves the subset capability of std::string

#include <iostream>
#include <experimental/string_view>

void foo(std::experimental::string_view v) { std::cout << v.substr(2,8) << '\n'; }

int main()
{
char const* buffer = "war and peace";
foo(buffer);
}

Live Example (requires libstdc++ 4.9 or higher in C++14 mode).



Related Topics



Leave a reply



Submit