Why Is Std::Map Implemented as a Red-Black Tree

Why is std::map implemented as a red-black tree?

Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.

While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.

Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.

Why std::map is red black tree and not hash table ?

It's largely a historical accident. The standard containers (along with iterators and algorithms) were one of the very last additions before the feature set of the standard was frozen. As it happened, they didn't have what they considered an adequate definition of a hash-based map at the time, and there wasn't time to add it before features were frozen, so the original specification included only a tree-based map.

C++ 11 added std::unordered_map (as well as std::unordered_set and multi versions of both), which is based on hashing though.

I am using red-black tree as a data structure. Is it better than std::unordered_map?

What do you use the RB tree for? If you need the objects you're storing to respect a sorting criterion, then your RB Trees are probably a better option. However, I suggest using STL's containers which are implemented using balanced BSTs std::map and std::set.

If you don't care about objects ordering, then use a HashTable. C++11 introduced std::unordered_map and std::unordered_set which have constant insert and look up time in average.

Using STL's Internal Implementation of Red-Black Tree

Actually - the answer is very simple, and independent of your version of gcc. You can download the stl source code from sgi's website, and see the implementation and use for yourself.

For example, in version 3.2, you can see the red-black tree implementation in the stl_tree.h file, and an example of its use in stl_set.h.

Note that since the stl classes are template classes, the implementations are actually inside the header files.

Red-black tree in C++ STL

std::map, std::multimap, std::set and std::multiset are often implemented in terms of red-black trees but doing so is not mandated by the standard. Since using a red-black tree is not required there is also no requirement for any particular flavor of RB tree.

I believe (though am not certain) that SGI's STL (upon which much of the original standard library is based) actually does have a red-black tree available. If it helps, I know boost::intrusive does have a reusable red-black tree implementation.

Does any std::set implementation not use a red-black tree?

Some folks over at Google actually built a B-tree based implementation of the C++ standard library containers. They seem to have much better performance than standard binary tree implementations.

There is a catch, though. The C++ standard guarantees that deleting an element from a map or set only invalidates other iterators pointing to the same element in the map or set. With the B-tree based implementation, due to node splits and consolidations, the erase member functions on these new structures may invalidate iterators to other elements in the tree. As a result, these implementations aren't perfect replacements for the standard implementations and couldn't be used in a conformant implementation.

Hope this helps!

Can hinted insert to std::map lead to unbalanced tree?

Regardless of the underlying implementation of ordered associative containers (which is not specified by the standard, although it is usually a self-balancing binary search tree), the requirements for hinted insert (iterator insert( const_iterator hint, const value_type& value );) can be satisfied with the following simple algorithm:

Compare value with *hint and *prev(hint)
If it fits between them, insert it there.
Otherwise, ignore the hint.

As long as prev(hint) is amortized constant time, that algorithm is amortized constant time providing that hint was correct. ("Correct" in the sense that it is the position just after the insertion point.)

It is perfectly acceptable to ignore the hint if it is incorrect, so providing an incorrect hint does not result in any difference to the datastructure; it is still as balanced (logarithmically accessible) as it was before the insert. But providing an incorrect hint forces the computation of O(1) extra comparisons, so the hinted version of insert should only be used if the hint is usually correct, for some value of "usually".

A common use case is when a search has already been made for the entry to be inserted, so that the insertion position is definitely known. This avoids the overhead of searching twice when something needs to be done prior to the insert, without sacrificing safety in the event that some other process modified the set between the find() and hinted insert() (assuming appropriate locking, of course).

Do every implementation of Red-Black Tree have the same number of red nodes for the same input?

Yes. There are choices to make about how to maintain the RB-tree constraints, so different implementations can produce different trees.

Why Is Std::Map Implemented as a Red-Black Tree