How to Properly Replace Global New & Delete Operators

How to properly replace global new & delete operators

That's not how this works. You replace the two operators, and this is done at link time. All you need to do is write a single TU that defines these operators and link it into the mix. Nobody else ever needs to know about this:

// optional_ops.cpp

void * operator new(std::size_t n) throw(std::bad_alloc)
{
  //...
}
void operator delete(void * p) throw()
{
  //...
}

In principle, there's no need for any header files to declare these functions (operator new, operator delete), since the declarations of those two functions are already hardcoded into the language, if you will. However, the names std, std::bad_alloc and std::size_t are not predeclared, so you will probably want to include <new> or some other header to provide those names.

In C++11 and beyond, you can alternatively use decltype(sizeof(0)) to get the size of the first parameter in a way that doesn't require any kind of library. C++11 also has a simpler exception model without dynamic exception specifications (which were finally removed from the language entirely in C++17).

void * operator new(decltype(sizeof(0)) n) noexcept(false)
{
  //...
}

How do I replace global operator new and delete in an MFC application (debug only)

The MSVCRT debug heap is actually pretty good and has some useful features you can use, such as breakpoint on the nth allocation etc.

http://msdn.microsoft.com/en-us/library/974tc9t1(v=VS.80).aspx

Among other things you can insert an allocation hook which outputs debugging information etc which you can use to debug this sort of issue.

http://msdn.microsoft.com/en-us/library/z2zscsc2(v=vs.80).aspx

In your case all you really need to do is output the address, and file and line of each allocation. Then when you experience a corrupt block, find the block whose address immediately precedes it, which will almost certainly be the one which overran. You can use the memory view in the Visual Studio debugger to look at the memory address which was corrupted and see the preceding blocks. That should tell you all you need to know to find out when it was allocated.

The debug heap also has a numerical allocation ID on each block allocated, and can break on the nth! allocation, so if you can get a reasonably consistent repro, so the same numerical block is corrupted each time, then you should be able to use the "break on nth" functionality to get a full call stack for the time it was allocated.

You may also find _CrtCheckMemory useful to find out if corruption has occurred much earlier. Just call it periodically, and once you have the bug bracketed (error didn't occur in one, did occur in the other) move them closer and closer together.

Why would one replace default new and delete operators?

One may try to replace new and delete operators for a number of reasons, namely:

To Detect Usage Errors:

There are a number of ways in which incorrect usage of new and delete may lead to the dreaded beasts of Undefined Behavior & Memory leaks.
Respective examples of each are:

Using more than one delete on newed memory & not calling delete on memory allocated using new.

An overloaded operator new can keep a list of allocated addresses and the overloaded operator delete can remove addresses from the list, then it is easy to detect such usage errors.

Similarly, a variety of programming mistakes can lead to data overruns(writing beyond the end of an allocated block) and underruns(writing prior to the beginning of an allocated block).

An Overloaded operator new can over-allocate blocks and put known byte patterns ("signatures") before and after the memory made available to clients. The overloaded operator deletes can check to see if the signatures are still intact.
Thus by checking if these signatures are not intact it is possible to determine that an overrun or under-run occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer, thus helping in providing a good diagnostic information.

To Improve Efficiency(speed & memory):

The new and delete operators work reasonably well for everybody, but optimally for nobody. This behavior arises from the fact that they are designed for general purpose use only. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. Eventually, the operator new and operator delete that ship with compilers take a middle-of-the-road strategy.

If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform (faster in performance, or require less memory up to 50%)the default ones. Of course, unless you are sure of what you are doing it is not a good idea to do this(don't even try this if you don't understand the intricacies involved).

To Collect Usage Statistics:

Before thinking of replacing new and delete for improving efficiency as mentioned in #2, You should gather information about how your application/program uses dynamic allocation. You may want to collect information about:

Distribution of allocation blocks,

Distribution of lifetimes,

Order of allocations(FIFO or LIFO or random),

Understanding usage patterns changes over a period of time,maximum amount of dynamic memory used etc.

Also, sometimes you may need to collect usage information such as:

Count the number of dynamically objects of a class,

Restrict the number of objects being created using dynamic allocation etc.

All, this information can be collected by replacing the custom new and delete and adding the diagnostic collection mechanism in the overloaded new and delete.

To compensate for suboptimal memory alignment in `new`:

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints can lead to hardware exceptions at run-time. Other architectures are more forgiving, and may allow it to work though reducing the performance.The operator new that ship with some compilers don't guarantee eight-byte alignment for dynamic
allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance & can be a good reason to replace new and delete operators.

To cluster related objects near one another:

If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. custom Placement versions of new and delete can make it possible to achieve such clustering.

To obtain unconventional behavior:

Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer.

For example: You might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.

Is it possible to replace the global operator new() everywhere?

The C++ standard explicitly allows you to write your own global operator new and delete (and array variants). The linker has to make it work, though exactly how is up to the implementors (e.g., things like weak externals can be helpful for supplying something if and only if one isn't already present).

As far as DLLs go, it's going to be tricky: a statically linked DLL clearly won't use your code without a lot of extra work. Static linking means it already has a copy of the library code copied into the DLL, and any code in the DLL that used it has the address of that code already encoded. To get around that, you'd have to figure out where the code for new is in the DLL and dynamically patch all the code that calls it to call yours instead).

If the DLL links to the standard library dynamically, it gets only marginally easier -- the import table still encodes the name of the DLL and function in that DLL that provides what it needs. That can be gotten around (e.g. with something like Microsoft's Detours library) but it's somewhat non-trivial (though certainly easier than when the DLL links the standard library statically).

Globaly replace all new operators

The default behaviour of the standard library for many of the operator new and operator delete overloads is to forward to the "basic" versions of operator new and operator delete.

The "basic" operators are:

void* operator new (std::size_t count);
void* operator new (std::size_t count, std::align_val_t alignment); // C++17 only
void operator delete(void* ptr) noexcept;
void operator delete(void* ptr, std::align_val_t alignment) noexcept; // C++17 only

Assuming that you have a standard library implementation that implements the default behaviour as defined in the C++ standard¹, the above operators are the only ones that you need to replace.

The default behaviour was first defined in the C++11 standard, so your standard library implementation must support this at minimum.

[1]: See the section titled "Storage allocation and deallocation [new.delete]" in a C++ standard.

Any reason to overload global new and delete?

We overload the global new and delete operators where I work for many reasons:

pooling all small allocations -- decreases overhead, decreases fragmentation, can increase performance for small-alloc-heavy apps
framing allocations with a known lifetime -- ignore all the frees until the very end of this period, then free all of them together (admittedly we do this more with local operator overloads than global)
alignment adjustment -- to cacheline boundaries, etc
alloc fill -- helping to expose usage of uninitialized variables
free fill -- helping to expose usage of previously deleted memory
delayed free -- increasing the effectiveness of free fill, occasionally increasing performance
sentinels or fenceposts -- helping to expose buffer overruns, underruns, and the occasional wild pointer
redirecting allocations -- to account for NUMA, special memory areas, or even to keep separate systems separate in memory (for e.g. embedded scripting languages or DSLs)
garbage collection or cleanup -- again useful for those embedded scripting languages
heap verification -- you can walk through the heap data structure every N allocs/frees to make sure everything looks ok
accounting, including leak tracking and usage snapshots/statistics (stacks, allocation ages, etc)

The idea of new/delete accounting is really flexible and powerful: you can, for example, record the entire callstack for the active thread whenever an alloc occurs, and aggregate statistics about that. You could ship the stack info over the network if you don't have space to keep it locally for whatever reason. The types of info you can gather here are only limited by your imagination (and performance, of course).

We use global overloads because it's convenient to hang lots of common debugging functionality there, as well as make sweeping improvements across the entire app, based on the statistics we gather from those same overloads.

We still do use custom allocators for individual types too; in many cases the speedup or capabilities you can get by providing custom allocators for e.g. a single point-of-use of an STL data structure far exceeds the general speedup you can get from the global overloads.

Take a look at some of the allocators and debugging systems that are out there for C/C++ and you'll rapidly come up with these and other ideas:

valgrind
electricfence
dmalloc
dlmalloc
Application Verifier
Insure++
BoundsChecker
...and many others... (the gamedev industry is a great place to look)

(One old but seminal book is Writing Solid Code, which discusses many of the reasons you might want to provide custom allocators in C, most of which are still very relevant.)

Obviously if you can use any of these fine tools you will want to do so rather than rolling your own.

There are situations in which it is faster, easier, less of a business/legal hassle, nothing's available for your platform yet, or just more instructive: dig in and write a global overload.

How to Properly Replace Global New & Delete Operators