Memory Allocation and Deallocation Across Dll Boundaries

Allocating and Deallocating memory across shared lib boundaries

As long as you stick with Glibc or other "normal" allocators (jemalloc, tcmalloc, etc.) the heap state will be shared by all libraries and thus you'll be able to free memory allocated somewhere with malloc anywhere you want.

In theory it may be possible to circumvent this. For example some library may be linked with custom implementation of malloc/free (via symbol scripts trickery of -Bsymbolic) which has it's own private heap and thus will not interact well with other parts of your program. But I've never seen anything like this in real life.

STL containers are based on malloc/free so it is possible to pass/modify them across library boundaries as well. Of course different libraries may be compiled with different compilers and different incompatible versions of STL (e.g. libstdc++, libcxx, etc.) but their C++ container types would be different and compiler simply would not allow you to pass them across incompatible modules.

Memory relocation for vector across DLL boundaries

The easiest way round this is to hide the vector inside methods exposed from the dll so you can get a const vector to view in other dlls, but not modify it. Then create a new function in dll B that adds the entry to the vector on the caller's behalf.

So your vector remains hidden inside dll B for all modifications.

Memory / heap management across DLLs

As you suggested, you can use a boost::shared_ptr to handle that problem. In the constructor you can pass a custom cleanup function, which could be the deleteObject-Method of the dll that created the pointer. Example:

boost::shared_ptr< MyObject > Instance( getObject( ), deleteObject );

If you do not need a C-Interface for your dll, you can have getObject return a shared_ptr.

How can you track memory across DLL boundaries

The right way to implement this is to use detours and a separate tool that runs in its own process. The procedure is roughly the following:

Create memory allocation in a remote process.
Place there code of a small loader that will load your dll.
Call CreateRemoteThread API that will run your loader.
From inside of the loaded dll establish detours (hooks, interceptors) on the alloc/dealloc functions.
Process the calls, track activity.

If you implement your tool this way, it will be not important from what DLL or directly from exe the memory allocation routines are called. Plus you can track activities from any process, not necessarily that you compiled yourself.

MS Windows allows checking contents of the virtual address space of the remote process. You can summarize use of virtual address space that was collected this way in a histogram, like the following:

Sample Image

From this picture you can see how many virtual allocation of what size are existing in your target process.

Sample Image

The picture above shows an overview of the virtual address space usage in 32-bit MSVC DevEnv. Blue stripe means a commited piece of emory, magenta stripe - reserved. Green is unoccupied part of the address space.

You can see that lower addresses are pretty fragmented, while the middle area - not. Blue lines at high addresses - various dlls that are loaded into the process.

Issue cleaning up heap allocated resources across a Windows DLL module boundary

Your deleter is a part of unique_ptr and is invoked from main when the pointer goes out of scope.

You should either provide GetMyLibrary()/FreeMyLibrary() in DLL and handle memory allocation/deallocation there (using RAII on applications side), or pass an allocator to GetMyLibrary() and make memory allocation and deallocation application's responsibility.

Within a DLL function returning memory, where to allocate and where to deallocate?

Another method is to change your function to the following:

DECL_EXPORT LONG organizeArgs(LPCSTR args, LPSTR outbuf, LONG length);

Then the API could be documented like this:

args - is the set of arguments
outbuf - is the output buffer or NULL
length - length of the output buffer, ignored if outbuf is NULL

Returns: 
Number of characters written to outbuf, or if outbuf is NULL, 
returns the maximum number of characters that would have been written.

So the onus is on the client on whether to call the function twice. If the client is confident that they have a buffer big enough to hold the information, then they will allocate it and call your function once using the length argument to limit the number of characters.

If they are not confident or want to ensure that they get all the arg information, then the client is responsible for calling your function twice, the first time with outbuf being NULL and getting the return value, and a second time with outbuf being the allocated buffer.

This is exactly how a few Windows API functions work. The DLL allocates no memory whatsoever.

Is it bad practice to allocate memory in a DLL and give a pointer to it to a client app?

Here are some reasons for having the caller supply a pointer:

Symmetric ownership semantics. This is already explained by several other answers.
Avoids mismatching the allocator and deallocator. As mentioned in Aesthete's answer, if the DLL allocates a pointer and returns it, the caller must call the corresponding deallocator to free it. This is not necessarily trivial: the DLL might be statically linked against one version of, say, malloc/free while the .exe is linked against a different version of malloc/free. (For example, the DLL could be using release versions while the .exe is using specialized debug versions.)
Flexibility. If the DLL is meant for general use, having the caller allocate the memory gives the caller more options. Suppose the caller doesn't want to use malloc and instead wants memory to be allocated from some specific memory pool. Maybe it's a case where the caller could provide a pointer to memory allocated on the stack. If the DLL allocated the memory itself, the caller does not have any of these options.

(The second and third points also mostly can be addressed by having the .exe supply an allocator/deallocator for the DLL code to use.)

Correct use of shared_ptr to eliminate deallocation across DLL boundaries

You're right with both statements. A second correct way would be to return a raw pointer by createObject(..), initialize a shared_ptr with it and pass a custom deleter to the shared_ptr. The custom deleter is a library function like releaseObject(..).

Edit:
With your version (createObject(..) returns a shared_ptr<..>) you're bound to a specific shared_ptr implementation of the library and the library user. In my proposed way this restriction is gone.