Bstr to Std::String (Std::Wstring) and Vice Versa

BSTR to std::string (std::wstring) and vice versa

BSTR to std::wstring:

// given BSTR bs
assert(bs != nullptr);
std::wstring ws(bs, SysStringLen(bs));

 

std::wstring to BSTR:

// given std::wstring ws
assert(!ws.empty());
BSTR bs = SysAllocStringLen(ws.data(), ws.size());

Doc refs:

  1. std::basic_string<typename CharT>::basic_string(const CharT*, size_type)
  2. std::basic_string<>::empty() const
  3. std::basic_string<>::data() const
  4. std::basic_string<>::size() const
  5. SysStringLen()
  6. SysAllocStringLen()

Why is my BSTR to std::wstring conversion so slow? Is my tester bad?

Interesting and a bit surprising indeed. The difference in performance for Visual C++ 2013 Update 4 is down to the way the two std::wstring constructors are implemented in its standard library. Generally speaking, the constructor taking a pair of iterators has to handle more cases, as those iterators are not necessarily pointers, and they can point to other data types than the string's character type (the character type just needs to be constructible from the type pointed to by the iterators). However, I was expecting the implementation to handle your case separately with optimized code.

std::wstring wstr(CHECKNULLSTR(bstr)); indeed scans the string for the end 0, then allocates, then copies the string data over in the fastest possible way using memcpy, which is implemented using assembly code.

std::wstring wstr(bstr, bstr + ::SysStringLen(bstr)); indeed avoids the scan because of ::SysStringLen (which is very fast, just reads the stored length), then allocates, but then copies the string data over using the following loop:

for (; _First != _Last; ++_First)
append((size_type)1, (_Elem)*_First);

VC12 decides not to inline the append call (understandably so, the body is pretty big), and all this, as you can imagine, carries quite a bit of overhead compared to a blazing memcpy.


One solution is to use the std::basic_string constructor that takes a pointer and a count (also mentioned by Ben Voigt in his comment), like this:

std::wstring wstr(CHECKNULLSTR(bstr), ::SysStringLen(bstr));

I've just tested it, and it does bring the expected benefits on Visual C++ 2013 - it sometimes takes just half the time of the first version, and about 75% in the worst case (these are approximate measurements anyway).


The standard library implementation in Visual C++ 2015 CTP6 has an optimized code path for the constructor taking an iterator pair when the iterators are actually pointers to the same character type as the string to be constructed, resulting in essentially the same code as the pointer-and-count variant above. So, on this version, it doesn't matter which of these two constructor variants you use for your case - they're both faster than the version taking only a pointer.

Advantage of std::wstring over CComBSTR

std::wstring has more methods for actual string handling, whereas CComBSTR is meant specifically for holding a BSTR string. BSTRs are used mostly by COM methods and have a different memory layout. Generally you should use std::wstring or CString unless you actually need the memory layout of BSTRs.

When using W2A to convert BSTR to std::string, is there any clean up needed?

Be very cautious with the W2A/A2W macros. They are implemented with "alloca" (dynamic allocation directly on the stack). In certain circumstances involving loop/recursion/long string, you will get a "stackoverflow" (no kidding).

The recommanded way is to use the "new" helpers templates. See ATL and MFC String Conversion Macros

A a;
CComBSTR textValue;
// some function which fills textValue
CW2A pszValue( textValue );
a.name = pszValue;

The conversion use a regular "in stack" buffer of 128 bytes. If it's to small, the heap is automatically used. You can adjust the trade-off by using directly the template types

A a;
CComBSTR textValue;
// some function which fills textValue
CW2AEX<32> pszValue( textValue );
a.name = pszValue;

Don't worry: you just reduced your stack usage, but if 32 bytes is not enough, the heap will be used. As I said, it's a trade-off. If you don't mind, use CW2A.

In either case, no clean up to do:-)

Beware, when pszValue goes out of scope, any pending char* to the conversion may be pointing to junk. Be sure to read the "Example 3 Incorrect use of conversion macros." and "A Warning Regarding Temporary Class Instances" in the above link.

How do you convert CString and std::string std::wstring to each other?

According to CodeGuru:

CString to std::string:

CString cs("Hello");
std::string s((LPCTSTR)cs);

BUT: std::string cannot always construct from a LPCTSTR. i.e. the code will fail for UNICODE builds.

As std::string can construct only from LPSTR / LPCSTR, a programmer who uses VC++ 7.x or better can utilize conversion classes such as CT2CA as an intermediary.

CString cs ("Hello");
// Convert a TCHAR string to a LPCSTR
CT2CA pszConvertedAnsiString (cs);
// construct a std::string using the LPCSTR input
std::string strStd (pszConvertedAnsiString);

std::string to CString: (From Visual Studio's CString FAQs...)

std::string s("Hello");
CString cs(s.c_str());

CStringT can construct from both character or wide-character strings. i.e. It can convert from char* (i.e. LPSTR) or from wchar_t* (LPWSTR).

In other words, char-specialization (of CStringT) i.e. CStringA, wchar_t-specilization CStringW, and TCHAR-specialization CString can be constructed from either char or wide-character, null terminated (null-termination is very important here) string sources.

Althoug IInspectable amends the "null-termination" part in the comments:

NUL-termination is not required.

CStringT has conversion constructors that take an explicit length argument. This also means that you can construct CStringT objects from std::string objects with embedded NUL characters.



Related Topics



Leave a reply



Submit