Count Number of Distinct Values in a Vector

Count number of distinct values in a vector

Here are a few ideas, all points towards your solution already being very fast. length(unique(x)) is what I would have used as well:

x <- sample.int(25, 1000, TRUE)

library(microbenchmark)
microbenchmark(length(unique(x)),
nlevels(factor(x)),
length(table(x)),
sum(!duplicated(x)))
# Unit: microseconds
# expr min lq median uq max neval
# length(unique(x)) 24.810 25.9005 27.1350 28.8605 48.854 100
# nlevels(factor(x)) 367.646 371.6185 380.2025 411.8625 1347.343 100
# length(table(x)) 505.035 511.3080 530.9490 575.0880 1685.454 100
# sum(!duplicated(x)) 24.030 25.7955 27.4275 30.0295 70.446 100

Count of number of elements between distinct elements in vector

Or

ave(seq.int(x), x, FUN = function(x) c(NA, diff(x)))
# [1] NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6

We calculate the first difference of the indices for each group of x.


A data.table option thanks to @Henrik

library(data.table)
dt = data.table(x)
dt[ , d := .I - shift(.I), x]
dt

count number of unique values in a 128bit avx vector, or detecting if all elements are equal?

For your simplified problem (test all 4 lanes for equality), I would do it slightly differently, here’s how. This way it only takes 3 instructions for the complete test.

// True when the input vector has the same value in all 32-bit lanes
inline bool isSameValue( __m128i v )
{
// Rotate vector by 4 bytes
__m128i v2 = _mm_shuffle_epi32( v, _MM_SHUFFLE( 0, 3, 2, 1 ) );
// The XOR outputs zero for equal bits, 1 for different bits
__m128i xx = _mm_xor_si128( v, v2 );
// Use PTEST instruction from SSE 4.1 set to test the complete vector for all zeros
return (bool)_mm_testz_si128( xx, xx );
}

How do I count the number of unique vectors in a list?

Use match on the entire list vs. the unique list:

my.list <- list(c(1, 1, 0), c(1, 1, 0), c(2, 1, 0))
table(match(my.list,unique(my.list)))

#1 2
#2 1

cbind(
data.frame(id=I(unique(my.list))),
count=as.vector(table(match(my.list,unique(my.list))))
)
# id count
#1 1, 1, 0 2
#2 2, 1, 0 1

Count number of occurences for each unique value

Perhaps table is what you are after?

dummyData = rep(c(1,2, 2, 2), 25)

table(dummyData)
# dummyData
# 1 2
# 25 75

## or another presentation of the same data
as.data.frame(table(dummyData))
# dummyData Freq
# 1 1 25
# 2 2 75

Efficiently count number of distinct values in 16-byte buffer in arm neon

If you're expecting a lot of duplicates, and can efficiently get a horizontal min with vminv_u8, this might be better than scalar. Or not, maybe NEON->ARM stalls for the loop condition kill it. >.< But it should be possible to mitigate that with unrolling (and saving some info in registers to figure out how far you overshot).

// pseudo-code because I'm too lazy to look up ARM SIMD intrinsics, edit welcome
// But I *think* ARM can do these things efficiently,
// except perhaps the loop condition. High latency could be ok, but stalling isn't

int count_dups(uint8x16_t v)
{
int dups = (0xFF == vmax_u8(v)); // count=1 if any elements are 0xFF to start
auto hmin = vmin_u8(v);

while (hmin != 0xff) {
auto min_bcast = vdup(hmin); // broadcast the minimum
auto matches = cmpeq(v, min_bcast);
v |= matches; // min and its dups become 0xFF
hmin = vmin_u8(v);
dups++;
}
return dups;
}

This turns unique values into 0xFF, one set of duplicates at a time.

The loop-carried dep chain through v / hmin stays in vector registers; it's only the loop branch that needs NEON->integer.


Minimizing / hiding NEON->integer/ARM penalties

Unroll by 8 with no branches on hmin, leaving results in 8 NEON registers. Then transfer those 8 values; back-to-back transfers of multiple NEON registers to ARM only incurs one total stall (of 14 cycles on whatever Jake tested on.) Out-of-order execution could also hide some of the penalty for this stall. Then check those 8 integer registers with a fully-unrolled integer loop.

Tune the unroll factor to be large enough that you usually don't need another round of SIMD operations for most input vectors. If almost all of your vectors have at most 5 unique values, then unroll by 5 instead of 8.


Instead of transferring multiple hmin results to integer, count them in NEON. If you can use ARM32 NEON partial-register tricks to put multiple hmin values in the same vector for free, it's only a bit more work to shuffle 8 of them into one vector and compare for not-equal to 0xFF. Then horizontally add that compare result to get a -count.

Or if you have values from different input vectors in different elements of a single vector, you can use vertical operations to add results for multiple input vectors at once without needing horizontal ops.


There's almost certainly room to optimize this, but I don't know ARM that well, or ARM performance details. NEON's hard to use for anything conditional because of the big performance penalty for NEON->integer, totally unlike x86. Glibc has a NEON memchr with NEON->integer in the loop, but I don't know if it uses it or if it's faster than scalar.


Speeding up repeated calls to the scalar ARM version:

Zeroing the 256-byte buffer every time would be expensive, but we don't need to do that. Use a sequence number to avoid needing to reset:

  • Before every new set of elements: ++seq;
  • For each element in the set:

    sum += (histogram[i] == seq);
    histogram[i] = seq; // no data dependency on the load result, unlike ++

You might make the histogram an array of uint16_t or uint32_t to avoid needing to re-zero if a uint8_t seq wraps. But then it takes more cache footprint, so maybe just re-zeroing every 254 sequence numbers makes the most sense.

Clojure - counting unique values from vectors in a seq

In Clojure you can do it almost the same way - first call distinct to get unique values and then use count to count results:

(def vectors (list [ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ]))
(defn count-unique [coll]
(count (distinct coll)))
(def result [(count-unique (map first vectors)) (count-unique (map second vectors))])

Note that here you first get list of first and second elements of vectors (map first/second vectors) and then operate on each separately, thus iterating over collection twice. If performance does matter, you may do same thing with iteration (see loop form or tail recursion) and sets, just like you would do in Java. To further improve performance you can also use transients. Though for beginner like you I would recommend first way with distinct.

UPD. Here's version with loop:

(defn count-unique-vec [coll]
(loop [coll coll, e1 (transient #{}), e2 (transient #{})]
(cond (empty? coll) [(count (persistent! e1)) (count (persistent! e2))]
:else (recur (rest coll)
(conj! e1 (first (first coll)))
(conj! e2 (second (first coll)))))))
(count-unique-vec vectors) ==> [2 3]

As you can see, no need in atoms or something like that. First, you pass state to each next iteration (recur call). Second, you use transients to use temporary mutable collections (read more on transients for details) and thus avoid creation of new object each time.

UPD2. Here's version with reduce for extended question (with price):

(defn count-with-price
"Takes input of form ([customer invoice price] [customer invoice price] ...)
and produces vector of 3 elements, where 1st and 2nd are counts of unique
customers and invoices and 3rd is total sum of all prices"
[coll]
(let [[custs invs total]
(reduce (fn [[custs invs total] [cust inv price]]
[(conj! custs cust) (conj! invs inv) (+ total price)])
[(transient #{}) (transient #{}) 0]
coll)]
[(count (persistent! custs)) (count (persistent! invs)) total]))

Here we hold intermediate results in a vector [custs invs total], unpack, process and pack them back to a vector each time. As you can see, implementing such nontrivial logic with reduce is harder (both to write and read) and requires even more code (in looped version it is enough to add one more parameter for price to loop args). So I agree with @ammaloy that for simpler cases reduce is better, but more complex things require more low-level constructs, such as loop/recur pair.

better way of counting unique item

It can depend on what you mean by "better", but there are certainly ways that are simpler, and other ways that are probably faster.

The really simple way would be to insert the items into a std::set or std::unordered_set. When you've inserted all of them, the size of the set will be the number of unique items.

The probably faster method would be to use std::sort and std::unique to find the unique items "in place" instead of copying them. This is pretty much what std::unique_copy will normally do internally anyway, but doing it in place saves a fair amount on allocation and copying.

std::vector<Items> v;

// populate v with data

std::sort(v.begin(), v.end());
int uniqueCount = std::unique(v.begin(), v.end()) - v.begin();

Counting the number of unique values in a vector

    String[] input = {"Hello", "my", "Hello", "apple", "Hello"};
// use hashmap to track the number of strings
HashMap<String, Integer> map = new HashMap<String, Integer>();
// use arraylist to track the sequence of the output
ArrayList<String> list = new ArrayList<String>();
for (String str : input){
if(map.containsKey(str)){
map.put(str, map.get(str)+1);
} else{
map.put(str, 1);
list.add(str); // if the string never occurred before, add it to arraylist
}
}

int[] output = new int[map.size()];
int index = 0;
for (String str : list){
output[index] = map.get(str);
index++;
}

for (int i : output){
System.out.println(i);
}

This should be your answer! Result is in "int[] output"



Related Topics



Leave a reply



Submit