﻿ Count Unique Combinations of Values - ITCodar

# Count Unique Combinations of Values

## count unique combinations of values

`count` in `plyr` package will do that task.

``> df  ID   value.1   value.2   value.3 value.41  1     M         D         F           A2  2     F         M         G           B3  3     M         D         F           A4  4     L         D         E           B> library(plyr)> count(df[, -1])    value.1   value.2   value.3 value.4 freq1     F         M         G           B    12     L         D         E           B    13     M         D         F           A    2``

## Counting unique combinations of values across multiple columns regardless of order?

Assuming the character `/` doesn't show up in any of the offer names, you can do:

``select count(distinct offer_combo) as distinct_offersfrom (  select listagg(offer, '/') within group (order by offer) as offer_combo  from (    select customer_id, offer_1 as offer from t    union all select customer_id, offer_2 from t    union all select customer_id, offer_3 from t  ) x  group by customer_id) y``

Result:

``DISTINCT_OFFERS---------------2``

See running example at db<>fiddle.

## Count unique combinations regardless of column order

Another solution, using `.groupby`:

``x = (    df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))    .agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))    .reset_index(drop=True))print(x)``

Prints:

``       A      B  count0    cat  bunny      11  bunny  mouse      22    dog    cat      33  mouse    dog      1``

## How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?

Use `Series.reindex` with `MultiIndex.from_product`:

``s = df.groupby(['Colour', 'TOY_ID']).size()s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)print (s)Colour  TOY_ID    Blue    31490.0       50        31569.0       50        50360636.0    20        50366678.0     0Green   31490.0       17        31569.0        0        50360636.0     0        50366678.0    10Yellow  31490.0        0        31569.0        0        50360636.0    25        50366678.0     9Name: a, dtype: int64``

## count unique combinations of variable values in an R dataframe column

An option with `tidyverse` where group by 'id', `paste` the 'status' and get the `count`

``library(dplyr)library(stringr)df %>%    group_by(id) %>%    summarise(status = str_c(status, collapse="")) %>%    count(status)# A tibble: 4 x 2#  status     n#  <chr>  <int>#1 abc        2#2 b          1#3 bc         2#4 bcd        2``

## Add a count unique combinations across rows in pandas

I think yes, you can use:

``cols = df.columns.difference(['id']).tolist()#should working like#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']df = df.groupby(cols, sort=False).size().reset_index(name='count')print (df)   cat_1    cat_2   cat_3 cat_4  count0  Chips     Null    Null  Null      11  Chips  Avocado    Null  Null      12  Chips    Pasta    Null  Null      23  Chips    Pasta  Cheese  Null      14  Chips    Sauce  Cheese  Null      15  Pasta     Null    Null  Null      26  Pasta    Bread    Null  Null      27  Pasta   Cheese    Null  Null      1``

## Count unique combinations in and summarize other columns in new one

We could use return as a `list`

``library(data.table)dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]        a      b      c     N     new_col   <char> <char> <char> <int>      <list>1:     1a     1b     1c     2       n1,n22:     2a     2b     2c     4 n1,n2,n3,n4``

## unique combinations of values in selected columns in pandas data frame and count

You can `groupby` on cols 'A' and 'B' and call `size` and then `reset_index` and `rename` the generated column:

``In [26]:df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})Out[26]:     A    B  count0   no   no      11   no  yes      22  yes   no      43  yes  yes      3``

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call `size` which returns the number of unique groups:

``In[202]:df1.groupby(['A','B']).size()Out[202]: A    B  no   no     1     yes    2yes  no     4     yes    3dtype: int64``

So now to restore the grouped columns, we call `reset_index`:

``In[203]:df1.groupby(['A','B']).size().reset_index()Out[203]:      A    B  00   no   no  11   no  yes  22  yes   no  43  yes  yes  3``

This restores the indices but the size aggregation is turned into a generated column `0`, so we have to rename this:

``In[204]:df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})Out[204]:      A    B  count0   no   no      11   no  yes      22  yes   no      43  yes  yes      3``

`groupby` does accept the arg `as_index` which we could have set to `False` so it doesn't make the grouped columns the index, but this generates a `series` and you'd still have to restore the indices and so on....:

``In[205]:df1.groupby(['A','B'], as_index=False).size()Out[205]: A    B  no   no     1     yes    2yes  no     4     yes    3dtype: int64``

Submit