Checking for Duplicates in a Vector

Check std::vector has duplicates

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Determine if there are duplicates in vector

The "best" option does not exist. It depends on how you define "best".

Here are some solutions, each with their own advantages and disadvantages:

Using map
template <class T>
auto has_duplicates(const std::vector<T>& v) -> bool
{
std::unordered_map<T, int> m;

for (const auto& e : v)
{
++m[e];

if (m[e] > 1)
return true;
}
return false;
}
Using set
template <class T>
auto has_duplicates(const std::vector<T>& v) -> bool
{
std::unordered_set<int> s;
std::copy(v.begin(), v.end(), std::inserter(s, s.begin());

return v.size() != s.size();
}
Using sort and adjacent_find (mutating range)
template <class T>
auto has_duplicates(std::vector<T>& v) -> bool
{
std::sort(v.begin(), v.end());

return std::adjacent_find(v.begin(), v.end()) != v.last();
}

Manual iteration with std::find
template <class T>
auto has_duplicates(const std::vector<T>& v) -> bool
{
for (auto it = v.begin(); it != v.end(); ++it)
if (std::find(it + 1, v.end(), *it) != v.end())
return true;

return false;
}
Manual iteration
template <class T>
auto has_duplicates(const std::vector<T>& v) -> bool
{
for (auto i1 = v.begin(); i1 != v.end(); ++i1)
for (auto i2 = i1 + 1; i2 != v.end(); ++i2)
if (*i1 == *i2)
return true;

return false;
}

Checking for duplicates in a vector

Use a hash table in which you insert each element. Before you insert an element, check if it's already there. If it is, you have yourself a duplicate. This is O(n) on average, but the worst case is just as bad as your current method.

Alternatively, you can use a set to do the same thing in O(n log n) worst case. This is as good as the sorting solution, except it doesn't change the order of the elements (uses more memory though since you create a set).

Another way is to copy your vector to another vector, sort that and check the adjacent elements there. I'm not sure if this is faster than the set solution, but I think sorting adds less overhead than the balanced search trees a set uses so it should be faster in practice.

Of course, if you don't care about keeping the original order of the elements, just sort the initial vector.

How can i know if a sorted vector has duplicate values or not in C++

Since your vector is sorted, you can check if two adjacent elements are equals :

for (auto it = vec.begin() + 1; it != vec.end(); ++it)
{
if (vec[it] == vec[it - 1])
{
// duplicate
return true;
}
}
// no duplicate
return false;

You could also use std::adjacent_find which returns an iterator to the first element of the first duplicate in your vector:

auto it = std::adjacent_find(vec.begin(), vec.end());
if (it == vec.end())
{
// no duplicate
return false;
}
// duplicate
return true;

Identifying duplicates in a list of character vectors in R

A binary output can be generated with

any(duplicated(unlist(my_list)))
[1] TRUE

As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique, if desired:

any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE

or another base R alternative

anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE

Checking for duplicates in vector and count them c++

You could simply keep a map that counts the number of words.

vector<string> vec{"words", "words", "are", "fun", "fun", "fun"};
map<string, int> words;
for(const auto& x : vec) ++(words[x]);

for(const auto& [k, v] : words)
if(v > 1) cout << k << " - " << v << "\n";

live wandbox example


Note that I'm using a C++17 feature called "structured bindings" to destructure words's pairs to [k, v]. If you don't have a C++17 compiler, you can use const auto& p : words and access the pair members with p.first and p.second.

Check for duplicates in large vector of strings

If you don't mind reordering the vector, then this should do it in O(n*log(n)) time:

std::sort(vector.begin(), vector.end());
vector.erase(std::unique(vector.begin(), vector.end()), vector.end());

To preserve the order, you could instead use a vector of (line-number, string*) pairs: sort by string, uniquify using a comparator that compares string contents, and finally sort by line number, along the lines of:

struct pair {int line, std::string const * string};

struct OrderByLine {
bool operator()(pair const & x, pair const & y) {
return x.line < y.line;
}
};

struct OrderByString {
bool operator()(pair const & x, pair const & y) {
return *x.string < *y.string;
}
};

struct StringEquals {
bool operator()(pair const & x, pair const & y) {
return *x.string == *y.string;
}
};

std::sort(vector.begin(), vector.end(), OrderByString());
vector.erase(std::unique(vector.begin(), vector.end(), StringEquals()), vector.end());
std::sort(vector.begin(), vector.end(), OrderByLine());


Related Topics



Leave a reply



Submit