insert vs emplace vs operator[] in c++ map
In the particular case of a map the old options were only two: operator[]
and insert
(different flavors of insert
). So I will start explaining those.
The operator[]
is a find-or-add operator. It will try to find an element with the given key inside the map, and if it exists it will return a reference to the stored value. If it does not, it will create a new element inserted in place with default initialization and return a reference to it.
The insert
function (in the single element flavor) takes a value_type
(std::pair<const Key,Value>
), it uses the key (first
member) and tries to insert it. Because std::map
does not allow for duplicates if there is an existing element it will not insert anything.
The first difference between the two is that operator[]
needs to be able to construct a default initialized value, and it is thus unusable for value types that cannot be default initialized. The second difference between the two is what happens when there is already an element with the given key. The insert
function will not modify the state of the map, but instead return an iterator to the element (and a false
indicating that it was not inserted).
// assume m is std::map<int,int> already has an element with key 5 and value 0
m[5] = 10; // postcondition: m[5] == 10
m.insert(std::make_pair(5,15)); // m[5] is still 10
In the case of insert
the argument is an object of value_type
, which can be created in different ways. You can directly construct it with the appropriate type or pass any object from which the value_type
can be constructed, which is where std::make_pair
comes into play, as it allows for simple creation of std::pair
objects, although it is probably not what you want...
The net effect of the following calls is similar:
K t; V u;
std::map<K,V> m; // std::map<K,V>::value_type is std::pair<const K,V>
m.insert( std::pair<const K,V>(t,u) ); // 1
m.insert( std::map<K,V>::value_type(t,u) ); // 2
m.insert( std::make_pair(t,u) ); // 3
But the are not really the same... [1] and [2] are actually equivalent. In both cases the code creates a temporary object of the same type (std::pair<const K,V>
) and passes it to the insert
function. The insert
function will create the appropriate node in the binary search tree and then copy the value_type
part from the argument to the node. The advantage of using value_type
is that, well, value_type
always matches value_type
, you cannot mistype the type of the std::pair
arguments!
The difference is in [3]. The function std::make_pair
is a template function that will create a std::pair
. The signature is:
template <typename T, typename U>
std::pair<T,U> make_pair(T const & t, U const & u );
I have intentionally not provided the template arguments to std::make_pair
, as that is the common usage. And the implication is that the template arguments are deduced from the call, in this case to be T==K,U==V
, so the call to std::make_pair
will return a std::pair<K,V>
(note the missing const
). The signature requires value_type
that is close but not the same as the returned value from the call to std::make_pair
. Because it is close enough it will create a temporary of the correct type and copy initialize it. That will in turn be copied to the node, creating a total of two copies.
This can be fixed by providing the template arguments:
m.insert( std::make_pair<const K,V>(t,u) ); // 4
But that is still error prone in the same way that explicitly typing the type in case [1].
Up to this point, we have different ways of calling insert
that require the creation of the value_type
externally and the copy of that object into the container. Alternatively you can use operator[]
if the type is default constructible and assignable (intentionally focusing only in m[k]=v
), and it requires the default initialization of one object and the copy of the value into that object.
In C++11, with variadic templates and perfect forwarding there is a new way of adding elements into a container by means of emplacing (creating in place). The emplace
functions in the different containers do basically the same thing: instead of getting a source from which to copy into the container, the function takes the parameters that will be forwarded to the constructor of the object stored in the container.
m.emplace(t,u); // 5
In [5], the std::pair<const K, V>
is not created and passed to emplace
, but rather references to the t
and u
object are passed to emplace
that forwards them to the constructor of the value_type
subobject inside the data structure. In this case no copies of the std::pair<const K,V>
are done at all, which is the advantage of emplace
over the C++03 alternatives. As in the case of insert
it will not override the value in the map.
An interesting question that I had not thought about is how emplace
can actually be implemented for a map, and that is not a simple problem in the general case.
Is c++11 operator[] equivalent to emplace on map insertion?
There are a lot of differences between the two.
If you use operator[]
, then the map
will default construct the value. The return value from operator[]
will be this default constructed object, which will then use operator=
to assign to it.
If you use emplace
, the map
will directly construct the value with the parameters you provide.
So the operator[]
method will always use two-stage construction. If the default constructor is slow, or if copy/move construction is faster than copy/move assignment, then it could be problematic.
However, emplace
will not replace the value if the provided key already exists. Whereas operator[]
followed by operator=
will always replace the value, whether there was one there or not.
There are other differences too. If copying/moving throws, emplace
guarantees that the map
will not be changed. By contrast, operator[]
will always insert a default constructed element. So if the later copy/move assignment fails, then the map
has already been changed. That key will exist with a default constructed value_type
.
Really, performance is not the first thing you should be thinking about when deciding which one to use. You need to focus first on whether it has the desired behavior.
C++17 will provide insert_or_assign
, which has the effect of map[] = v;
, but with the exception safety of insert/emplace
.
how can you assign a new value to a reference in the first place?
It's fundamentally no different from assigning to any non-const
reference:
int i = 5;
int &j = i;
j = 30;
i == 30; //This is true.
What is the difference between unordered_map::emplace and unordered_map::insert in C++?
unordered_map::insert
copies or moves a key-value pair into the container. It is overloaded to accept reference-to-const or an rvalue reference:
std::pair<iterator,bool> insert(const std::pair<const Key, T>& value);
template<class P>
std::pair<iterator,bool> insert(P&& value);
unordered_map::emplace
allows you to avoid unnecessary copies or moves by constructing the element in place. It uses perfect forwarding and a variadic template to forward arguments to the constructor of the key-value pair:
template<class... Args>
std::pair<iterator,bool> emplace(Args&&... args);
But there is a great deal of overlap between the two functions. emplace
can be used to forward to the copy/move constructor of the key-value pair which allows it to be used just as insert
would. This means that use of emplace
doesn't guarantee you will avoid copies or moves. Also the version of insert
that takes an rvalue-reference is actually templated and accepts any type P
such that the key-value pair is constructible from P
.
Scott Meyers says:
In principle, emplacement functions should sometimes be more efficient
than their insertion counterparts, and they should never be less
efficient.
( Edit: Howard Hinnant ran some experiments that showed sometimes insert
is faster than emplace
)
If you definitely do want to copy/move into the container it might be wise to use insert
because you are more likely to get a compilation error if you pass incorrect arguments. You need to be more careful you are passing the correct arguments to the emplacement functions.
Most implementations of unordered_map::emplace
will cause memory to be dynamically allocated for the new pair even if the map contains an item with that key already and the emplace
will fail. This means that if there is a good chance that an emplace
will fail you may get better performance using insert to avoid unneccessary dynamic memory allocations.
Small example:
#include <unordered_map>
#include <iostream>
int main() {
auto employee1 = std::pair<int, std::string>{1, "John Smith"};
auto employees = std::unordered_map<int, std::string>{};
employees.insert(employee1); // copy insertion
employees.insert(std::make_pair(2, "Mary Jones")); // move insertion
employees.emplace(3, "James Brown"); // construct in-place
for (const auto& employee : employees)
std::cout << employee.first << ": " << employee.second << "\n";
}
Edit2: On request. It is also possible to use unordered_map::emplace
with a key or value that takes more than one constructor parameter. Using the std::pair
piecewise constructor you can still avoid unnecessary copies or moves.
#include <unordered_map>
#include <iostream>
struct Employee {
std::string firstname;
std::string lastname;
Employee(const std::string& firstname, const std::string& lastname)
: firstname(firstname), lastname(lastname){}
};
int main() {
auto employees = std::unordered_map<int, Employee>{};
auto employee1 = std::pair<int, Employee>{1, Employee{"John", "Smith"}};
employees.insert(employee1); // copy insertion
employees.insert(std::make_pair(2, Employee{"Mary", "Jones"})); // move insertion
employees.emplace(3, Employee("Sam", "Thomas")); // emplace with pre-constructed Employee
employees.emplace(std::piecewise_construct,
std::forward_as_tuple(4),
std::forward_as_tuple("James", "Brown")); // construct in-place
}
C++ Set emplace vs insert when an object is already created
emplace
does its work by perfect forwarding its parameters to the right constructor (by using likely a placement new in most of the implementations).
Because of that, in your case it forwards an lvalue reference and thus it invokes likely the copy constructor.
What's now the difference with a push_back
that explicitly calls the copy constructor?
Meyers also cites that in one of his books, and he says that there is no actual gain in calling emplace
if you already have an instance of the object.
In STL maps, is it better to use map::insert than []?
When you write
map[key] = value;
there's no way to tell if you replaced the value
for key
, or if you created a new key
with value
.
map::insert()
will only create:
using std::cout; using std::endl;
typedef std::map<int, std::string> MyMap;
MyMap map;
// ...
std::pair<MyMap::iterator, bool> res = map.insert(MyMap::value_type(key,value));
if ( ! res.second ) {
cout << "key " << key << " already exists "
<< " with value " << (res.first)->second << endl;
} else {
cout << "created key " << key << " with value " << value << endl;
}
For most of my apps, I usually don't care if I'm creating or replacing, so I use the easier to read map[key] = value
.
How best to use emplace with std::map
Should you just pass the arguments of the constructor
Yes, because this is literally what all emplace()
functions are designed for. With insert()
, you have to construct an object, and then [usually] copy it into your container. And generally, if you're using a container, you're only constructing so you can put them into the container. As you can see in your tests, it's a bit of extra work.
emplace()
was designed to allow you to construct directly into the container. And you do so by providing constructor parameters to the emplace function. insert()
is used if you've already got an object and want to put it in a container.
I had a snarky comment that others have noted is worth explaining a bit more. If your class (which I'll call Foo
) has single parameter constructors, it may appear that you can do the same thing as emplace()
by just passing the single parameter to something like insert()
or push_back()
or any place that would take a Foo as a parameter. This is a 'feature' of the language where the compiler will implicitly construct a Foo
for you and use it. The problem is that under the hood, it's not doing the same thing. Where emplace()
will build your object directly in the container, faking it by taking advantage of a single parameter constructor still causes copies to be made. Another downside to consider is this implicit conversion. It can hurt readability of your code or worse, break things. This can be avoided by marking the constructor as explicit
.
STL map insertion efficiency: [] vs. insert
Both accomplish different things.
m[key] = val;
Will insert a new key-value pair if the key
doesn't exist already, or it will overwrite the old value mapped to the key
if it already exists.
m.insert(make_pair(key, val));
Will only insert the pair if key
doesn't exist yet, it will never overwrite the old value. So, choose accordingly to what you want to accomplish.
For the question what is more efficient: profile. :P Probably the first way I'd say though. The assignment (aka copy) is the case for both ways, so the only difference lies in construction. As we all know and should implement, a default construction should basically be a no-op, and thus be very efficient. A copy is exactly that - a copy. So in way one we get a "no-op" and a copy, and in way two we get two copies.
Edit: In the end, trust what your profiling tells you. My analysis was off like @Matthieu mentions in his comment, but that was my guessing. :)
Then, we have C++0x coming, and the double-copy on the second way will be naught, as the pair can simply be moved now. So in the end, I think it falls back on my first point: Use the right way to accomplish the thing you want to do.
Understand std::map::insert & emplace with hint
First, let's create a dataset more meaningful than 12 integers:
std::vector<int> v(10000);
std::iota(v.rbegin(), v.rend(), 0);
Results from all functions are now more comparable: https://quick-bench.com/q/HW3eYL1RaFMCJvDdGLBJwEbDLdg
However, there's a worse thing. Notice that looping over state
makes it perform the same operations several times to measure the average time. But, since you are reusing the same map, each insert
or emplace
after the first loop iteration is failing, so you mostly measure time of failed inserts, where hint
doesn't help.
Test cases should look more like this:
std::vector<int> v(1000);
std::iota(v.rbegin(), v.rend(), 0);
for (auto _ : state) {
std::map<int, int> mapInt;
auto where(std::end(mapInt));
for (const auto &n : v) { // Items in non-incremental order
where = mapInt.emplace_hint(where, n, n+1);
}
}
And with this, hints start to shine (had to limit data to 1000, otherwise I'd get timeouts): https://quick-bench.com/q/2cR4zU_FZ5HQ6owPj9Ka_y9FtZE
I'm not sure if the benchmarks are correct, but quick glance in the assembly suggests that inserts were not optimized altogether, so there's a chance it's good enough.
As noticed by Ted Lyngmo, try_emplace()
with hint
tends to perform (slightly) better:
https://quick-bench.com/q/evwcw4ovP20qJzfsyl6M-_37HzI
Related Topics
What Happens When a Computer Program Runs
Default, Value and Zero Initialization Mess
Default Argument in the Middle of Parameter List
Getline() Does Not Work If Used After Some Inputs
Problem with Std::Map::Iterator After Calling Erase()
C++ Remove Punctuation from String
What Is the Type of a String Literal in C++
Split an Integer into Its Digits C++
How to Validate Numeric Input in C++
Where Do Object File "Version References" Come From
Code Runs Perfect in G++ But Not in Xcode - Cannot Find File
Why Is Iterating Though 'Std::Vector' Faster Than Iterating Though 'Std::Array'
Modifying a Char *Const String
Are Pointer Variables Just Integers with Some Operators or Are They "Symbolic"
The Precision of Std::To_String(Double)
Initializer-List-Constructing a Vector of Noncopyable (But Movable) Objects