How to Make Elements of Vector Unique? (Remove Non Adjacent Duplicates)

Remove duplicated members from vector while maintaining order

If your data was not doubles, simply doing a pass with an unordered_set keeping track of what you have already seen via the remove_if-erase idiom would work.

doubles are, however, bad news when checking for equality: two derivations that you may think should produce the same value may generate different results. A set would allow looking for nearby values. Simply use equal_range with a plus minus epsilon instead of find to see if there is another value approximetally equal to yours earlier in the vector, and use the same remove erase idiom.

The remove erase idiom looks like:

vec.erase( std::remove_if( vec.begin(), vec.end(), [&](double x)->bool{
// return true if you want to remove the double x
}, vec.end());

in C++03 it cannot be done inline.

The lambda above will be called for each element in order, like the body of a loop.

How to remove duplicates and keep only unique pointers in list?

If you can change the order of the elements, then first sort the list with list::sort, then remove the duplicates with list::unique.

std::less<Person*> cmp;
persons.sort(cmp);
persons.unique(cmp);

On the other hand, you could use an std::set. It's elements are unique, ordered, and the insert method fails if an element is already present in the set.

Bear in mind that the time complexity of insertion of single elements is logarithmic, whereas adding elements to the front or back of a list is constant time. On the other hand, std::list::sort is N*log(N), and std::unique is linear. So if you intent to perform these duplicate removals frequently, you are better off with using an std::set in the first place. Also note that in C++11 there is std::unordered_set, which has element uniqueness and average constant complexity for insertions and removals.

How to remove non contiguous elements from a vector in c++

Here's my attempt on this removal algorithm.

Assuming the selection vector is sorted and using some (unavoidable ?) pointer arithmetic, this can be done in one line:

template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(), selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}

This is based on an idea of Sean Parent (see this C++ Seasoning video) to use std::stable_partition ("stable" keeps elements sorted in the output array) to move all selected items to the end of an array.

The line with pointer arithmetic

static_cast<int>(static_cast<const T*>(&item) - &v[0])

can, in principle, be replaced with STL algorithms and index-free expression

std::distance(std::find(v.begin(), v.end(), item), std::begin(v))

but this way we have to spend O(n) in std::find.

The shortest way to remove non-contiguous elements:

template <class T> void erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());

// 1) Define checker lambda
// 'filter' is called only once for every element,
// all the calls respect the original order of the array
// We manually keep track of the item which is filtered
// and this way we can look this index in 'sorted_sel' array
int itemIndex = 0;
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++);
}

// 2) Move all 'not-selected' to the end
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);

// 3) Cut off the end of the std::vector
v.resize(std::distance(v.begin(), end_of_selected));
}

Original code & test


If for some reason the code above does not work due to strangely behaving std::stable_partition(), then below is a workaround (wrapping the input array values with selected flags.

I do not assume that inputInfo structure contains the selected flag, so I wrap all the items in the T_withFlag structure which keeps pointers to original items.

#include <algorithm>
#include <iostream>
#include <vector>

template <class T>
std::vector<T> erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());

// Packed (data+flag) array
struct T_withFlag
{
T_withFlag(const T* ref = nullptr, bool sel = false): src(ref), selected(sel) {}

const T* src;
bool selected;
};

std::vector<T_withFlag> v_with_flags;

// should be like
// { {0, true}, {0, true}, {3, false},
// {0, true}, {2, false}, {4, false},
// {5, false}, {0, true}, {7, false} };
// for the input data in main()

v_with_flags.reserve(v.size());

// No "beautiful" way to iterate a vector
// and keep track of element index
// We need the index to check if it is selected
// The check takes O(log(n)), so the loop is O(n * log(n))
int itemIndex = 0;
for (auto& ii: v)
v_with_flags.emplace_back(
T_withFlag(&ii,
std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++)
));

// I. (The bulk of ) Removal algorithm
// a) Define checker lambda
auto filter = [](const T_withFlag& ii) { return !ii.selected; };
// b) Move every item marked as 'not-selected'
// to the end of an array
auto end_of_selected = std::stable_partition(
v_with_flags.begin(),
v_with_flags.end(),
filter);
// c) Cut off the end of the std::vector
v_with_flags.resize(
std::distance(v_with_flags.begin(), end_of_selected));

// II. Output
std::vector<T> v_out(v_with_flags.size());
std::transform(
// for C++20 you can parallelize this
// with 'std::execution::par' as first parameter
v_with_flags.begin(),
v_with_flags.end(),
v_out.begin(),
[](const T_withFlag& ii) { return *(ii.src); });
return v_out;
}

The test function is

int main()
{
// Obviously, I do not know the structure
// used by the topic starter,
// so I just declare a small structure for a test
// The 'erase_selected' does not assume
// this structure to be 'light-weight'
struct inputInfo
{
int data;
inputInfo(int v = 0): data(v) {}
};

// Source selection indices
std::vector<int> selection { 0, 1, 3, 7 };
// Source data array
std::vector<inputInfo> v{ 0, 0, 3, 0, 2, 4, 5, 0, 7 };

// Output array
auto v_out = erase_selected(v, selection);

for (auto ii : v_out)
std::cout << ii.data << ' ';

std::cout << std::endl;
}

How to get unique elements from vector using stl?

If you want to remove the duplicate while keeping the order of the elements, you can do it (inplace) in O(n log(n)) like this:

std::vector<std::string> v{"one", "two", "three", "two",  "ten", "six", "ten"};

// Occurrence count
std::map<std::string, int> m;

auto it = std::copy_if(v.cbegin(), v.cend(), v.begin(),
[&m](std::string const& s) { return !m[s]++; });
// ^ keep s if it's the first time I encounter it

v.erase(it, v.end());

Explanation: m keeps track of the number of times each string has already been encountered while scanning through v. The copy_if won't copy a string if it has already been found.


If you want to remove all the elements that are duplicated, you can do it (inplace) in O(n log(n)) like this:

// Occurrence count
std::map<std::string, int> m;

// Count occurrences of each string in v
std::for_each(v.cbegin(), v.cend(), [&m](std::string const& s) { ++m[s]; } );

// Only keep strings whose occurrence count is 1.
auto it = std::copy_if(v.cbegin(), v.cend(), v.begin(),
[&m](std::string const& s) { return m[s] == 1; });
v.erase(it, v.end());

How to remove duplicated elements from vector?

I think best option is to write some binary predicate to use for the sorting of the vector and use std::unique afterwards. Keep in mind that the predicate must be transitive!

If this is not an option you can not do anything else but use the quardatic algorithm:

std::vector<type> a
std::vector<type> result;
for (unsigned i = 0; i < a.size(); ++i) {
bool repeated = false;
for (int j = 0; j < i; ++j) {
if (a[j] == a[i]) {
repeated = true;
break;
}
}
if (!repeated) {
result.push_back(a[i]);
}
}

// result stores the unique elements.

C++, fast remove elements from vector unique to another vector

There are two tricks I would use to do this as quickly as possible:

  1. Use some sort of associative container (probably std::unordered_set) to store all of the integers in the second vector to make it dramatically more efficient to look up whether some integer in the first vector should be removed.

  2. Optimize the way in which you delete elements from the initial vector.

More concretely, I'd do the following. Begin by creating a std::unordered_set and adding all of the integers that are the first integer in the pair from the second vector. This gives (expected) O(1) lookup time to check whether or not a specific int exists in the set.

Now that you've done that, use the std::remove_if algorithm to delete everything from the original vector that exists in the hash table. You can use a lambda to do this:

std::unordered_set<int> toRemove = /* ... */
v1.erase(std::remove_if(v1.begin(), v1.end(), [&toRemove] (int x) -> bool {
return toRemove.find(x) != toRemove.end();
}, v1.end());

This first step of storing everything in the unordered_set takes expected O(n) time. The second step does a total of expected O(n) work by bunching all the deletes up to the end and making lookups take small time. This gives a total of expected O(n)-time, O(n) space for the entire process.

If you are allowed to sort the second vector (the pairs), then you could alternatively do this in O(n log n) worst-case time, O(log n) worst-case space by sorting the vector by the key, then using std::binary_search to check whether a particular int from the first vector should be eliminated or not. Each binary search takes O(log n) time, so the total time required is O(n log n) for the sorting, O(log n) time per element in the first vector (for a total of O(n log n)), and O(n) time for the deletion, giving a total of O(n log n).

Hope this helps!

What's the most efficient way to erase duplicates and sort a vector?

I agree with R. Pate and Todd Gardner; a std::set might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.

Let's compare three approaches:

Just using vector, sort + unique

sort( vec.begin(), vec.end() );
vec.erase( unique( vec.begin(), vec.end() ), vec.end() );

Convert to set (manually)

set<int> s;
unsigned size = vec.size();
for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );

Convert to set (using a constructor)

set<int> s( vec.begin(), vec.end() );
vec.assign( s.begin(), s.end() );

Here's how these perform as the number of duplicates changes:

comparison of vector and set approaches

Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.

And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.

Removing duplicates in a vector of strings

#include <algorithm>

template <typename T>
void remove_duplicates(std::vector<T>& vec)
{
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}

Note: this require that T has operator< and operator== defined.

Why it work?

std::sort sort the elements using their less-than comparison operator

std::unique removes the duplicate consecutive elements, comparing them using their equal comparison operator

What if i want only the unique elements?

Then you better use std::map

#include <algorithm>
#include <map>

template <typename T>
void unique_elements(std::vector<T>& vec)
{
std::map<T, int> m;
for(auto p : vec) ++m[p];
vec.erase(transform_if(m.begin(), m.end(), vec.begin(),
[](std::pair<T,int> const& p) {return p.first;},
[](std::pair<T,int> const& p) {return p.second==1;}),
vec.end());
}

See: here.



Related Topics



Leave a reply



Submit