Is It Reasonable to Use Std::Basic_String<T> as a Contiguous Buffer When Targeting C++03

Is it reasonable to use std::basic_string t as a contiguous buffer when targeting C++03?

I'd consider it quite safe to assume that std::string allocates its storage contiguously.

At the present time, all known implementations of std::string allocate space contiguously.

Moreover, the current draft of C++ 0x (N3000) [Edit: Warning, direct link to large PDF] requires that the space be allocated contiguously (§21.4.1/5):

The char-like objects in a
basic_string object shall be stored
contiguously. That is, for any
basic_string object s, the identity
&*(s.begin() + n) == &*s.begin() + n
shall hold for all values of n such
that 0 <= n < s.size().

As such, the chances of a current or future implementation of std::string using non-contiguous storage are essentially nil.

Does std::string need to store its character in a contiguous piece of memory?

The C++11 standard, basic_string 21.4.1.5,

The char-like objects in a basic_string object shall be stored
contiguously. That is, for any basic_string object s, the identity
&*(s.begin() + n) == &*s.begin() + n shall hold for all values of n
such that 0 <= n < s.size().

C++11 Allocation Requirement on Strings

Section 21.4.1.5 of the 2011 standard states:

The char-like objects in a basic_string object shall be stored
contiguously. That is, for any basic_string object s, the identity
&*(s.begin() + n) == &*s.begin() + n shall hold for all values of
n such that 0 <= n < s.size().

The two parts of the identity expression are

  1. Take the begin() iterator, advance by n, then dereference and take the address of the resulting element.
  2. Take the begin() iterator, dereference and take the address of the resulting element. Add n to this pointer.

Since both are required to be identical, this enforces contiguous storage; that is, the iterator cannot move over any non-contiguous storage without violating this requirement.

Is it legal to write to std::string?

std::string will be required to have contiguous storage with the new c++0x standard. Currently that is undefined behavior.

Does std::string really wrap up a C char array?

For all I know an std::string could be doing anything internally!

For all you know. The standard, of course, describes and demands certain semantics that rule anything out. It says the following on the basic_string template:

§21.4 [basic.string] p1

The class template basic_string describes objects that can store a sequence consisting of a varying number of arbitrary char-like objects with the first element of the sequence at position zero. Such a sequence is also called a “string” if the type of the char-like objects that it holds is clear from context. In the rest of this Clause, the type of the char-like objects held in a basic_string object is designated by charT.

And a "char-like object" is defined by the following text:

§21.1 [strings.general] p1

This Clause describes components for manipulating sequences of any non-array POD (3.9) type. In this Clause such types are called char-like types , and objects of char-like types are called char-like objects or simply characters.

This effectively means that you can stuff anything you want into basic_string, as long as it's not an array and it is a POD (see this and this for infos on what PODs are). These char-like objects are then manipulated with the help of character traits, which define the specific behaviour of and relationship between them.


[...] but how do I know if the method doesn't create a new c char array from whatever data it stores inside and return it?

In C++03 exactly this was possible to do for the implementation, a known defect that has since been corrected in C++11:

§2.4.1 [string.require] p5

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

See also these related questions:

  • Is string::c_str() allowed to allocate anything on the heap?
  • Is it legal to write to std::string?
  • Is it reasonable to use std::basic_string<t> as a contiguous buffer when targeting C++03?

Which string classes to use in C++?

I would use std::string.

  • Promote decoupling from MFC
  • Better interaction with existing C++ libraries

The "return by value" issue is mostly a non-issue. Compilers are very good at performing Return Value Optimization (RVO) which actually eliminates the copy in most cases when returning by value. If it doesn't, you can usually tweak the function.

COW has been rejected for a reason: it doesn't scale (well) and the so-hoped-for increase in speed has not been really measured (see Herb Sutter's article). Atomic operations are not as cheap as they appear. With mono-processor mono-core it was easy, but now multi-core are commodity and multi-processors are widely available (for servers). In such distributed architectures there are multiple caches, that need be synchronized, and the more distributed the architecture, the more costly the atomic operations.

Does CString implement Small String Optimization ? It's a simple trick that allows a string not to allocate any memory for small strings (usually a few characters). Very useful because it turns out that most strings are in fact small, how many strings in your application are less than 8-characters long ?

So, unless you present me a real benchmark which clearly shows a net gain in using CString, I'd prefer sticking with the standard: it's standard, and likely better optimized.

How can I get the address of the buffer allocated by vector::reserve()?

The vector buffer will not be moved after a call to reserve unless you exceed the reserved capacity. Your problem is getting the pointer to the first element. The obvious answer is to push a single fake entry into the vector, get the pointer to it, and then remove it.

A nicer approach would be if the library accepted a functor rather than a pointer, which it would call when it needed to access the buffer - you could make the functor put off getting the address until the buffer had some real contents. However, I realise you don't have the luxury of rewriting the library.

C++: using std::wstring in API function

The second parameter is an out parameter, so you can't just pass c_str (which is const) directly. It would probably be simplest just to do:

wchar_t wstrPath[MAX_PATH];
BOOL f = SHGetPathFromIDList(pidl, wstrPath);

MAX_PATH is currently 260 characters.

Mystical restriction on std::binary_search

If your goal is to find if there is a Human with a given name, then the following should work for sure:

const std::string& get_name(const Human& h)
{
return h.name;
}

...

bool result = std::binary_search(
boost::make_transform_iterator(v.begin(), &get_name),
boost::make_transform_iterator(v.end(), &get_name),
name_to_check_against);


Related Topics



Leave a reply



Submit