How to remove duplicates from unsorted std::vector while keeping the original ordering using algorithms?
The naive way is to use std::set
as everyone tells you. It's overkill and has poor cache locality (slow).
The smart* way is to use std::vector
appropriately (make sure to see footnote at bottom):
#include <algorithm>
#include <vector>
struct target_less
{
template<class It>
bool operator()(It const &a, It const &b) const { return *a < *b; }
};
struct target_equal
{
template<class It>
bool operator()(It const &a, It const &b) const { return *a == *b; }
};
template<class It> It uniquify(It begin, It const end)
{
std::vector<It> v;
v.reserve(static_cast<size_t>(std::distance(begin, end)));
for (It i = begin; i != end; ++i)
{ v.push_back(i); }
std::sort(v.begin(), v.end(), target_less());
v.erase(std::unique(v.begin(), v.end(), target_equal()), v.end());
std::sort(v.begin(), v.end());
size_t j = 0;
for (It i = begin; i != end && j != v.size(); ++i)
{
if (i == v[j])
{
using std::iter_swap; iter_swap(i, begin);
++j;
++begin;
}
}
return begin;
}
Then you can use it like:
int main()
{
std::vector<int> v;
v.push_back(6);
v.push_back(5);
v.push_back(5);
v.push_back(8);
v.push_back(5);
v.push_back(8);
v.erase(uniquify(v.begin(), v.end()), v.end());
}
*Note: That's the smart way in typical cases, where the number of duplicates isn't too high. For a more thorough performance analysis, see this related answer to a related question.
erase duplicate elements keeping order
One of many ways this can be accomplished it to use std::unordered_set to keep track of duplicates and std::stable_partition to partition the duplicates from the lone values while preserving the order of the items:
#include <iostream>
#include <vector>
#include <algorithm>
#include <unordered_set>
int main()
{
std::unordered_set<int> numSet;
std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
auto iter = std::stable_partition(v.begin(), v.end(), [&](int n)
{ bool ret = !numSet.count(n); numSet.insert(n); return ret; }); // returns true if the item has not been "seen"
v.erase(iter, v.end());
for(auto p : v)
std::cout << p << " ";
}
Output:
1 7 2 3 8 4 5 6 9 10
The std::stable_partition
will return true
if the item has not been seen, thus place it to the left of the partition point. Once done, an iterator to the partition point is returned, and we use this iterator to do one single erasure from that point to the end of the vector. Note that the lambda function updates the unordered_set
for each item processed.
The reason why std::stable_partition
was used instead of std::remove_if
is that std::remove_if
is not guaranteed to process the items in order. For example, it could have been possible for an implementation to process the second 1
in that data first, instead of the first 1
. So to be safe stable_partition
will not erase elements, but simply place the elements in the correct position, ready for the erasure at the end.
Way to delete/erase duplicate elements from std::vector while maintaining order?
How about using a temporary container:
std::vector< int >::iterator i , j ;
std::set< int > t_set;
for( i = v.begin() , j = v.begin() ; i != v.end() ; ++i )
if( t_set.insert( *i ).second)
*j++ = *i ;
v.erase( j , v.end() );
Using std::remove_if
, I can think of this:
std::set<int> t_set;
std::vector<int> res; //Resultant vector
remove_copy_if(v.begin(), v.end(), std::back_inserter(res),
[&t_set](int x){
return !t_set.insert(x).second;
} );
How to remove duplicated elements from vector?
I think best option is to write some binary predicate to use for the sorting of the vector and use std::unique
afterwards. Keep in mind that the predicate must be transitive!
If this is not an option you can not do anything else but use the quardatic algorithm:
std::vector<type> a
std::vector<type> result;
for (unsigned i = 0; i < a.size(); ++i) {
bool repeated = false;
for (int j = 0; j < i; ++j) {
if (a[j] == a[i]) {
repeated = true;
break;
}
}
if (!repeated) {
result.push_back(a[i]);
}
}
// result stores the unique elements.
What's the most efficient way to erase duplicates and sort a vector?
I agree with R. Pate and Todd Gardner; a std::set
might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.
Let's compare three approaches:
Just using vector, sort + unique
sort( vec.begin(), vec.end() );
vec.erase( unique( vec.begin(), vec.end() ), vec.end() );
Convert to set (manually)
set<int> s;
unsigned size = vec.size();
for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );
Convert to set (using a constructor)
set<int> s( vec.begin(), vec.end() );
vec.assign( s.begin(), s.end() );
Here's how these perform as the number of duplicates changes:
Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.
And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.
c++ - Remove duplicates from ordered vector of strings
A simple way is to iterate through the vector while keeping track of the elements encountered, and deleting those that have been encountered before.
Here is a piece of code that does exactly that.
std::unordered_set<std::string> encounters;
for (auto i = 0u; i < container.size(); ++i) {
if (!encounters.insert(container[i]).second) {
// The string was already in encounters
container.erase(container.begin() + i);
--i;
}
}
Live on Coliru.
It could probably be optimized, for example by deleting ranges of elements when all are duplicates, or maybe by swapping each new element with the current first duplicate and, at the end, erasing the whole end of the vector that contains all the duplicates.
Related Topics
/Usr/Lib/Libstdc++.So.6: Version 'Glibcxx_3.4.15' Not Found
When to Pass by Reference and When to Pass by Pointer in C++
Rotate an Image Without Cropping in Opencv in C++
Floating Point Keys in Std:Map
Omitting Return Statement in C++
How to Display a Dynamically Allocated Array in the Visual Studio Debugger
When to Overload the Comma Operator
When Should I Use _Mm_Sfence _Mm_Lfence and _Mm_Mfence
How to Convert Std::String to Lpcstr
Avoiding Circular Dependencies of Header Files
Cmake: How to Set Up Source, Library and Cmakelists.Txt Dependencies
Why Class Data Members Can't Be Initialized by Direct Initialization Syntax