Count Unique Combinations of Values

count unique combinations of values

count in plyr package will do that task.

> df
ID value.1 value.2 value.3 value.4
1 1 M D F A
2 2 F M G B
3 3 M D F A
4 4 L D E B
> library(plyr)
> count(df[, -1])
value.1 value.2 value.3 value.4 freq
1 F M G B 1
2 L D E B 1
3 M D F A 2

Counting unique combinations of values across multiple columns regardless of order?

Assuming the character / doesn't show up in any of the offer names, you can do:

select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y

Result:

DISTINCT_OFFERS
---------------
2

See running example at db<>fiddle.

Count unique combinations regardless of column order

Another solution, using .groupby:

x = (
df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
.agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
.reset_index(drop=True)
)
print(x)

Prints:

       A      B  count
0 cat bunny 1
1 bunny mouse 2
2 dog cat 3
3 mouse dog 1

How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?

Use Series.reindex with MultiIndex.from_product:

s = df.groupby(['Colour', 'TOY_ID']).size()

s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour TOY_ID
Blue 31490.0 50
31569.0 50
50360636.0 20
50366678.0 0
Green 31490.0 17
31569.0 0
50360636.0 0
50366678.0 10
Yellow 31490.0 0
31569.0 0
50360636.0 25
50366678.0 9
Name: a, dtype: int64

count unique combinations of variable values in an R dataframe column

An option with tidyverse where group by 'id', paste the 'status' and get the count

library(dplyr)
library(stringr)
df %>%
group_by(id) %>%
summarise(status = str_c(status, collapse="")) %>%
count(status)
# A tibble: 4 x 2
# status n
# <chr> <int>
#1 abc 2
#2 b 1
#3 bc 2
#4 bcd 2

Add a count unique combinations across rows in pandas

I think yes, you can use:

cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
cat_1 cat_2 cat_3 cat_4 count
0 Chips Null Null Null 1
1 Chips Avocado Null Null 1
2 Chips Pasta Null Null 2
3 Chips Pasta Cheese Null 1
4 Chips Sauce Cheese Null 1
5 Pasta Null Null Null 2
6 Pasta Bread Null Null 2
7 Pasta Cheese Null Null 1

Count unique combinations in and summarize other columns in new one

We could use return as a list

library(data.table)
dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]
a b c N new_col
<char> <char> <char> <int> <list>
1: 1a 1b 1c 2 n1,n2
2: 2a 2b 2c 4 n1,n2,n3,n4

unique combinations of values in selected columns in pandas data frame and count

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64


Related Topics



Leave a reply



Submit