Difference Between Erase and Remove

Difference between erase and remove

remove() doesn't actually delete elements from the container -- it only shunts non-deleted elements forwards on top of deleted elements. The key is to realise that remove() is designed to work on not just a container but on any arbitrary forward iterator pair: that means it can't actually delete the elements, because an arbitrary iterator pair doesn't necessarily have the ability to delete elements.

For example, pointers to the beginning and end of a regular C array are forward iterators and as such can be used with remove():

int foo[100];

...

remove(foo, foo + 100, 42);    // Remove all elements equal to 42

Here it's obvious that remove() cannot resize the array!

Difference between std::remove and erase for vector?

Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?

The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:

std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5

If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.

In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.

As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:

std::remove_copy_if(std::istream_iterator<int>(std::cin),
                    std::istream_iterator<int>(),
                    std::ostream_iterator<int>(std::cout, " "),
                    [](int x) { return !(x%2); } // is even
                    );

That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.

difference between erase and remove/remove_if algorithms?

No, std::remove_if will move the elements that don't match the predicate to the end of list and will return an iterator to the new "end". Erase will effectively drop the element (call the dtor) from the container.

The difference is perfectly illustrated by the examples here and here.

What is the difference between remove-erase and find-erase

Your remove-erase code is not correct. The remove-erase idiom looks like this:

vector<int>::iterator it = remove(v.begin(), v.end(), 5);
v.erase(it, v.end());

In that case, it has the effect of erasing all values equal to 5 but it minimises the amount of copying required to achieve that.

Your find-erase code removes only the first value equal to 5, so it does what you want.

Your remove-erase code moves all the values not equal to 5 to the front of the vector (that's what std::remove does), erases one of the remaining elements of the vector, and leaves any remaining elements after that with unspecified values (which is also what remove does). It has undefined behavior if the vector doesn't contain a 5 to begin with, because remove would return v.end() in that case.

So, if you only want to erase a single element of several equal to 5 then std::remove is no use to you because it doesn't preserve the (other) 5s. If you want to move the non-5 values to the start and the 5 values to the end, before removing the first of the 5s, then you could in fact do that with std::partition just not with std::remove:

auto it = partition(v.begin(), v.end(), [](int i) { return i != 5; });
if (it != v.end()) v.erase(it);

Although, since one 5 is as good as another you get the same result by erasing the last of the 5s rather than the first, and it's more efficient when there's more than one of them:

auto it = partition(v.begin(), v.end(), [](int i) { return i != 5; });
if (it != v.end()) v.pop_back();

If you can somehow be sure that the vector initially contains exactly one element equal to 5 (no more or less), then your two bits of code do the same thing. And in that case you wouldn't need the test for it != v.end() in the find-erase code, you would know it isn't equal. You could just do v.erase(find(v.begin(), v.end(), 5)).

Where is the performance gain of the erase-remove idiom coming from

The performance issue here is not about erasing the elements that are to be removed, or moving them to the end (which does not happen actually), but about moving the elements that are to be kept.

If you use erase on each element you want to remove, you need to move all the elements after these... for each call to erase. Typically, if you want to remove k elements, you will move the elements after the latest one (in the vector) k times instead of only one.

But if you call remove, you will only move these once (see example below).

A small example to better understand how these two methods work:

Let's say you have a vector of size 1000 and the elements you want to remove are at position 17 and 37.

With erase acting on the two elements to be removed:

when you call erase() for the 17th element, you need to move elements 18 to 999, 982 elements.
when you call erase() for the 36th element (it's the 36th one now!), you need to move elements 37 to 998, 962 elements.

In total, you have moved 962 + 982 = 1944 elements, 962 of these have been moved twice for nothing.

With remove, what happens is as follows:

element 0 does not change;
element 1 does not change;
...
element 17 is "discarded";
element 18 is moved at position 17;
element 19 is moved at position 18;
...
element 36 is moved at position 35;
element 37 is "discarded";
element 38 is moved at position 36;
...
element 999 is moved at position 997.

In total, you have moved 998 elements (1000 minus the two you removed), which is much better than the 1943 elements of the previous methods. This is even better if you have more than 2 elements to remove.

You can have a look at the possible implementation on en.cppreference.com to better understand how std::remove works.

Difference between two features of vector

The only sure way to release the memory that I can think of is to swap with a temporary vector:

vector<array<double,1000>> S1(1000);
...
vector<array<double,1000>>().swap(S1);

Although this might look strange at first, it is a known, widely used idiom.

In C++11, moving from the original vector might be an option, although it is not guaranteed to clear the memory or even clear the vector (I cannot think of a reason why an implementation wouldn't do that though):

{
  vector<array<double,1000>> tmp(std::move(S1));
} // tmp dies on exiting scope, memory is cleared

Altenratively, a call to std::vector::shrink_to_fit could result in memory de-allocation, but there are no guarantees:

S1.clear();
S1.shrink_to_fit();

In Lists, is remove same as erase?

Here is what the standard says:

Erase:

iterator erase(const_iterator position);
iterator erase(const_iterator first, const_iterator last);

Invalidates only the iterators and references to the erased elements.

Remove:
```
void remove(const T& value);
template <class Predicate> void remove_if(Predicate pred);
```
Erases all the elements in the list referred by a list iterator i for which the following conditions hold: *i == value, pred(*i) != false. Invalidates only the iterators and references to the erased elements.

If you don't have an iterator or a couple of iterators to be used with erase and you have to rearrange the element(s) first, using remove could be a better choice.

Note that remove will drop all the elements matching the predicate and its complexity is:

Exactly size() applications of the corresponding predicate

On the other side, find has still complexity O(N), but it actually stops when it finds the first occurrence that satisfies the condition.

In particular, it's complexity is defined by the standard as:

At most last - first applications of the corresponding predicate.

Where the following is the signature:

template<class InputIterator, class T>
InputIterator find(InputIterator first, InputIterator last, const T& value);

Differences in behavior: use your first proposal to drop all the elements matching the predicate, use the second proposal to (find and) drop only the first element matching the predicate.

Differences in performance: same complexity, in the worst case you won't have any benefit (well, in the best case you would have benefits on large containers, for O(N) is irrelevant if N is equal to 100).

To reply to your last question, if you want to remove nullptrs from a list and it could happen that there are more than one, prefer remove instead of find/erase.

map::erase: difference between erase by key or by iterator?

In the scenario you describe (m1.erase(2); vs. m1.erase(m1.find(2));), the two methods are should be entirely equivalent, give or take the cost of creating and returning iterators, though this depends on your STL implementation.

The point of erase by iterator is to remove an entry from the key, when you already have an iterator because of other operations your program needed to execute on the element this iterator refers to. For instance:

void processEntry(const std::pair<int, double>& p) {
     // do something like maybe writing it to a file
}

std::map<int, double> m1 = { { 1, 1.1 }, { 2, 2.2 }, { 3, 3.3 } };

const auto it = std::find_if(m1.begin(), m1.end(), [](const std::pair<int, double>& p) {
    return p.first > 1 && p.second < 3.0;
});

if (it != m1.end()) {
    processEntry(*it);
    m1.erase(it);
}