Find the N Most Common Values in a Vector

Find the n most common values in a vector

I'm sure this is a duplicate, but the answer is simple:

sort(table(variable),decreasing=TRUE)[1:3]

How to select the most common numbers from a vector or on array?(TOP5 toplist) C++

Here's another algorithm, the cost is going to be O(n) for any k, plus a decent cache locality.

1.First store all elements in a unordered_map O(N)

std::unordered_map<int, int> m;
for(const auto ele: numbers) {
  m[ele]++;   
}

2.Dump all elements in a vector of pairs O(N)

std::vector<std::pair<int, int>> temp;  
for(const auto& ele: m) {
 temp.emplace_back(ele.first ,  ele.second);   
}

3.Now use the nth_element to find kth rank O(N)

std::nth_element( temp.begin(), temp.begin()+k ,
temp.end(), [](const auto& p1, const auto& p2) {
        // Strict weak ordering
        if (p1.second > p2.second) {
            return true;
        }  if (p1.second < p2.second) {  
            return false;
        }
        return p1.first > p2.first; // We have to print large element first
    } );

4.Display the output

std::for_each( temp.begin(), temp.begin() +k - 1, [](const auto & p) {
    std::cout << p.first << " ";
});

Demo Here

How to retrieve the most repeated value in a column present in a data frame

tail(names(sort(table(Forbes2000$category))), 1)

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))

Rust - how to find n-th most frequent element in a collection

Your implementation has a time complexity of Ω(n log n), where n is the length of the array. The optimal solution to this problem has a complexity of Ω(n log k) for retrieving the k-th most frequent element. The usual implementation of this optimal solution indeed involves a binary heap, but not in the way you used it.

Here's a suggested implementation of the common algorithm:

use std::cmp::{Eq, Ord, Reverse};
use std::collections::{BinaryHeap, HashMap};
use std::hash::Hash;

pub fn most_frequent<T>(array: &[T], k: usize) -> Vec<(usize, &T)>
where
    T: Hash + Eq + Ord,
{
    let mut map = HashMap::new();
    for x in array {
        *map.entry(x).or_default() += 1;
    }

    let mut heap = BinaryHeap::with_capacity(k + 1);
    for (x, count) in map.into_iter() {
        heap.push(Reverse((count, x)));
        if heap.len() > k {
            heap.pop();
        }
    }
    heap.into_sorted_vec().into_iter().map(|r| r.0).collect()
}

(Playground)

I changed the prototype of the function to return a vector of the k most frequent elements together with their counts, since this is what you need to keep track of anyway. If you only want the k-th most frequent element, you can index the result with [k - 1][1].

The algorithm itself first builds a map of element counts the same way your code does – I just wrote it in a more concise form.

Next, we buid a BinaryHeap for the most frequent elements. After each iteration, this heap contains at most k elements, which are the most frequent ones seen so far. If there are more than k elements in the heap, we drop the least frequent element. Since we always drop the least frequent element seen so far, the heap always retains the k most frequent elements seen so far. We need to use the Reverse wrapper to get a min heap, as documented in the documentation of BinaryHeap.

Finally, we collect the results into a vector. The into_sorted_vec() function basically does this job for us, but we still want to unwrap the items from its Reverse wrapper – that wrapper is an implemenetation detail of our function and should not be returned to the caller.

(In Rust Nightly, we could also use the into_iter_sorted() method, saving one vector allocation.)

The code in this answer makes sure the heap is essentially limited to k elements, so an insertion to the heap has a complexity of Ω(log k). In your code, you push all elements from the array to the heap at once, without limiting the size of the heap, so you end up with a complexity of Ω(log n) for insertions. You essentially use the binary heap to sort a list of counts. Which works, but it's certainly neither the easiest nor the fastest way to achieve that, so there is little justification for going that route.

Find which numbers appears most in a vector

Sort it, then iterate through it and keep a counter that you increment when the current number is the same as the previous number and reset to 0 otherwise. Also keep track of what was the highest value of the counter thus far and what the current number was when that value was reached. This solution is O(n log n) (because of the sort).

Alternatively you can use a hashmap from int to int (or if you know the numbers are within a limited range, you could just use an array) and iterate over the vector, increasing the_hashmap[current_number] by 1 for each number. Afterwards iterate through the hashmap to find its largest value (and the key belonging to it). This requires a hashmap datastructure though (unless you can use arrays which will also be faster), which isn't part of STL.

Counting the number of elements with the values of x in a vector

You can just use table():

> a <- table(numbers)
> a
numbers
  4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
  2   1   2   2   1   1   2   1   2   1   3   1   1   1   1

Then you can subset it:

> a[names(a)==435]
435 
  3

Or convert it into a data.frame if you're more comfortable working with that:

> as.data.frame(table(numbers))
   numbers Freq
1        4    2
2        5    1
3       23    2
4       34    2
...

R how to find a series of common values in a vector (identifying growing season)

One option could be:

testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]

[1] 1 3 4 6 7 5 9

Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.

Find the N Most Common Values in a Vector