How to Use Standard Library (Stl) Classes in My Dll Interface or Abi

How can I use Standard Library (STL) classes in my dll interface or ABI?

Keep in mind one thing before you read further: My answer is coming from the point of view of writing portable code that can be used in applications made up of modules compiled under different compilers. This can include different versions or even different patch levels of the same compiler.

How can I use stl-classes in my
dll-interface?

Answer: You often can't1.

Reason: The STL is a code library, not a binary library like a DLL. It does not have a single ABI that is guaranteed to be the same wherever you might use it. Indeed, STL does stand for "Standard Template Library," but a key operative word here besides Standard is Template.

The Standard defines the methods and data members each STL class is required to provide, and it defines what those methods are to do; but no more. In particular, the Standard doesn't specify how compiler writers should implement the Standard-defined functionality. Compiler writers are free to provide a implementation of an STL class that adds member functions and member variables not listed in the Standard, so long as those members which are defined in the Standard are still there and do what the Standard says.

Maybe an example is in order. The basic_string class is defined in the Standard as having certain member functions & variables. The actual definition is almost 4 pages in the Standard, but here's just a snippet of it:

namespace std {
template<class charT, class traits = char_traits<charT>,
class Allocator = allocator<charT> >
class basic_string {
[snip]
public:
// 21.3.3 capacity:
size_type size() const;
size_type length() const;
size_type max_size() const;
void resize(size_type n, charT c);
void resize(size_type n);
size_type capacity() const;
void reserve(size_type res_arg = 0);
void clear();
bool empty() const;
[snip]
};

Consider the size() and length() member functions. There is nothing in the Standard that specified member variables for holding this information. Indeed, there are no member variables defined at all, not even to hold the string itself. So how is this implemented?

The answer is, many different ways. Some compilers might use a size_t member variable to hold the size and a char* to hold the string. Another one might use a pointer to some other data store which holds that data (this might be the case in a reference-counted implementation). In fact, different versions or even patch levels of the same compiler may change these implementation details. You can't rely on them. So, MSVC 10's implementation might look like this:

namespace std {
template<class charT, class traits = char_traits<charT>,
class Allocator = allocator<charT> >
class basic_string {
[snip]
char* m_pTheString;
};

size_t basic_string::size() const { return strlen(m_pTheString;) }

...Whereas MSVC 10 with SP1 might look like this:

namespace std {
template<class charT, class traits = char_traits<charT>,
class Allocator = allocator<charT> >
class basic_string {
[snip]
vector<char> m_TheString;
};

size_t basic_string::size() const { return m_TheString.size(); }

I'm not saying they do look like this, I'm saying they might. The point here is the actual implementation is platform-dependent, and you really have no way of knowing what it will be anywhere else.

Now say you use MSVC10 to write a DLL that exports this class:

class MyGizmo
{
public:
std::string name_;
};

What is the sizeof(MyGizmo)?

Assuming my proposed implementations above, under MSVC10 its going to be sizeof(char*), but under SP1 it will be sizeof(vector<char>). If you write an application in VC10 SP1 that uses the DLL, the size of the object will look different than it actually is. The binary interface is changed.


For another treatment of this, please see C++ Coding Standards (Amazon link) issue # 63.


1: "You often can't" You actually can export Standard Library components or any other code library components (such as Boost) with a fair amount of reliability when you have complete control over the toolchains and the libraries.

The fundamental problem is that with source code libraries the sizes and definitions of things can be different between different compilers and different versions of the library. If you are working in an environment where you control both of these things everywhere your code is used, then you probably won't have a problem. For example at a trading firm where all the systems are written in-house and used only in-house, it might be possible to do this.

How do I safely pass objects, especially STL objects, to and from a DLL?

The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.

Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.


If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:

Data packing/alignment

Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.

If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.

You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.

Member reordering

If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.

Calling convention

There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?

It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.

Datatype size

According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.

Heap issues

If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.

To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.

STL issues

The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.

Name mangling

Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL@@YAPAUCCDLL_v1@@XZ in MSVC.

You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.

You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:

GCC:

EXPORTS
GetCCDLL=_Z8GetCCDLLv @1

MSVC:

EXPORTS
GetCCDLL=?GetCCDLL@@YAPAUCCDLL_v1@@XZ @1

Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.

This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.

Passing class objects to a function

This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.


Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.

//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries

//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;

if (heapHandle == nullptr)
{
return nullptr;
}

storageHandle = HeapAlloc(heapHandle, 0, size);

return storageHandle;
}

void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}

if (ptr == nullptr)
{
return;
}

HeapFree(heapHandle, 0, ptr);
}
}

//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();

pod<T>& operator=(pod<T> value);
operator T() const;

T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)

//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;

public:
pod() : data(nullptr) {}

pod(const original_type& value)
{
set_from(value);
}

pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}

~pod()
{
release();
}

pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);

return *this;
}

operator original_type() const
{
return get();
}

protected:
safe_type* data;

original_type get() const
{
original_type result;

result = static_cast<original_type>(*data);

return result;
}

void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.

if (data == nullptr)
{
return;
}

new(data) safe_type (value);
}

void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}

void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;

swap(first.data, second.data);
}
};
#pragma pack(pop)

The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).

We can also wrap STL types in this way, although it requires a little extra work:

#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;

public:
pod() : data(nullptr) {}

pod(const original_type& value)
{
set_from(value);
}

pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}

pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}

~pod()
{
release();
}

pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);

return *this;
}

operator original_type() const
{
return get();
}

protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;

original_type get() const
{
original_type result;
result.reserve(dataSize);

std::copy(data, data + dataSize, std::back_inserter(result));

return result;
}

void set_from(const original_type& value)
{
dataSize = value.size();

data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));

if (data == nullptr)
{
return;
}

//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();

for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}

void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}

void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;

swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)

Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.

//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};

CCDLL_v1* GetCCDLL();

This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:

struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};

CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;

if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}

return CCDLL;
}

And now let's implement the ShowMessage function:

#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;

MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}

Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.

Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)

Now for an EXE to call the DLL:

//main.cpp
#include "../CCDLL/CCDLL.h"

typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;

int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.

Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;

CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.

pod<std::wstring> message = TEXT("Hello world!");

CCDLL_lib->ShowMessage(&message);

FreeLibrary(ccdll); //unload the library when we're done with it

return 0;
}

And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.

The image that showing the result afterward.


In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.

Using Templated Classes and Functions in a Shared Object/DLL

As you may know, the templates in your export file are in fact a 'permission to fill in whatever you think necessary' for the compiler.

That means that if you compile your header file with compiler A, it may instantiate a completely different deque<int> than compiler B. The order of some members may change, for one, or even the actual type of some member variables.

And that's what the compiler is warning you for.

EDIT: addes some consequences to the explanation

So your shared libraries will only work together nicely when compiled by the same compiler. If you want them to work together, you can either make sure that all client code 'sees' the same declaration (through using the same stl implementation), or step back from adding templates to your API.

Export a STL class member containing the same class in a DLL interface

Taken from here, the family of these errors are essentially noise;

C4251 is essentially noise and can be silenced

- Stephan T. Lavavej (one of the maintainer's of Micrsoft's C++ library).

So long as the compiler options are consistent through the project, just silencing this warning should be just fine.

As a more comprehensive alternative you can look at the pimpl pattern and remove the std::vector from the class definition and thus it would not need to be exported from the dll.

class __declspec(dllexport) Widget {
protected:
struct Pimpl;
Pimpl* pimpl_;
}

// in the cpp compiled into the dll
struct Widget::Pimpl {
// ...
std::vector<Widget*> children;
// ...
}

MSDN has a nice article on this as part of an introduction to newer C++11 features.

In general, it is easier to not use std classes in a dll interface, especially if interoperability is required.

Creating a class in a DLL using std::string. C4251 warning

std::string is just a typedef:

typedef basic_string<char, char_traits<char>, allocator<char> >
string;

Regarding using STL on the dll interface boundary, generally it is simpler if you avoid that and I personally prefer it if possible. But the compiler is just giving you a warning and it doesn't mean that you will end in a problem. Have a look at: DLLs and STLs and static data (oh my!)

Exposing std::vector over a dll boundary - best practice for compiler independence

You've pretty much nailed it. The standard requires vector elements to be contiguous in memory, and the vector elements won't be stack-allocated unless you're playing games with the vector's allocator, so it's always possible to represent the vector's data as a start and end pointer (or a start pointer and a size, if that's your thing). What you have should work fine.

However, I'm not sure how much use that is. std::vector doesn't really offer you anything except automatic memory management, and you don't want that; otherwise any vectors you construct in the DLL will, when destructed, deallocate your original array of doubles. You can get around that by copying the array, but you've said you don't want to do that either. STL algorithms work fine on pointers, so maybe there's no need to convert in the DLL.



Related Topics



Leave a reply



Submit