Group Values by Unique Elements

Compare each element in groupby() group to the unique values in that group and get the location of equality

Use factorize in GroupBy.transform :

df['order1']=df.groupby(['subject'])['date'].transform(lambda x: pd.factorize(x)[0]) + 1
print (df)
subject date order order1
0 A 01.01.2020 1 1
1 A 01.01.2020 1 1
2 A 02.01.2020 2 2
3 B 01.01.2020 1 1
4 B 02.01.2020 2 2
5 B 02.01.2020 2 2

Or you can use GroupBy.rank, but is necessary convert column date to datetimes:

df['order2']=df.groupby(['subject'])['date'].rank(method='dense')
print (df)
subject date order order1
0 A 2020-01-01 1 1.0
1 A 2020-01-01 1 1.0
2 A 2020-02-01 2 2.0
3 B 2020-01-01 1 1.0
4 B 2020-02-01 2 2.0
5 B 2020-02-01 2 2.0

Difference of solution is if changed order of datetimes:

print (df)
subject date order (disregarding temporal order of date)
0 A 2020-01-01 1
1 A 2020-03-01 2 <- changed datetime for sample
2 A 2020-02-01 3
3 B 2020-01-01 1
4 B 2020-02-01 2
5 B 2020-02-01 2

df['order1']=df.groupby(['subject'])['date'].transform(lambda x: pd.factorize(x)[0]) + 1
df['order2']=df.groupby(['subject'])['date'].rank(method='dense')
print (df)
subject date order order1 order2
0 A 2020-01-01 1 1 1.0
1 A 2020-03-01 1 2 3.0
2 A 2020-02-01 2 3 2.0
3 B 2020-01-01 1 1 1.0
4 B 2020-02-01 2 2 2.0
5 B 2020-02-01 2 2 2.0

In summary: use the first method if you don't care about the temporal order of date being reflected in the order output, or the second method if the temporal order matters and should reflect in the order output.

How to get unique values from multiple columns in a pandas groupby

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64

How do i get only the new unique values per group?

IIUC, you can use

(~df['user'].duplicated()).groupby(df['Month']).sum()

Demo:

>>> df 
Month user
0 2 Michael
1 2 Michael
2 3 Lea
3 3 Michael
>>> (~df['user'].duplicated()).groupby(df['Month']).sum()
Month
2 1
3 1

I'm assuming that the 'Month' column is sorted, otherwise the duplicated trick won't work.

edit: your exact output can be produced with

(~df['user'].duplicated()).groupby(df['Month']).sum().reset_index().rename({'user': 'Unique_Count_New_Users'}, axis=1)

Group values by unique elements

First of all, (I assume) this is your vector

a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")

As per possible solutions, here are few (can't find a good dupe right now)

as.integer(factor(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5

Or

cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5

Or

match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5

Also rle will work the similarly in your specific scenario

with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5

Or (which is practically the same)

data.table::rleid(a)
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5

Though be advised that all 4 solutions have their unique behavior in different scenarios, consider the following vector

a <- c("B110","B110","B110","A220","A220","C330","D440","D440","B110","B110","E550")

And the results of the 4 different solutions:

1.

as.integer(factor(a))
# [1] 2 2 2 1 1 3 4 4 2 2 5

The factor solution begins with 2 because a is unsorted and hence the first values are getting higher integer representation within the factor function. Hence, this solution is only valid if your vector is sorted, so don't use it other wise.

2.

cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 5

This cumsum/duplicated solution got confused because of "B110" already been present at the beginning and hence grouped "D440","D440","B110","B110" into the same group.

3.

match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 1 1 5

This match/unique solution added ones at the end, because it is sensitive to "B110" showing up in more than one sequences (because of unique) and hence grouping them all into same group regardless of where they appear

4.

with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 5 5 6

This solution only cares about sequences, hence different sequences of "B110" were grouped into different groups

GroupBy and count the unique elements in a List

var list = new List<string> { "Foo1", "Foo2", "Foo3", "Foo2", "Foo3", "Foo3", "Foo1", "Foo1" };

var grouped = list
.GroupBy(s => s)
.Select(group => new { Word = group.Key, Count = group.Count() });

how to group rows in an unique row for unique column values?

We could group by 'a', 'c', summarise the unique elements to 'b' in a string

library(dplyr)
df %>%
group_by(a, c) %>%
summarise(b = sprintf('[%s]', toString(unique(b))), .groups = 'drop') %>%
select(names(df))

-output

# A tibble: 3 x 3
# a b c
# <chr> <chr> <dbl>
#1 A1 [a, b, c] 1
#2 A2 [d, e] 1
#3 A3 [f] 1

Or if the 'c' values are also changing, use across

df %>%
group_by(a) %>%
summarise(across(everything(), ~ sprintf('[%s]',
toString(unique(.)))), .groups = 'drop')

Or if we need a list

df %>%
group_by(a) %>%
summarise(across(everything(), ~ list(unique(.))

), .groups = 'drop')

Or using glue

df %>%
group_by(a, c) %>%
summarise(b = glue::glue('[{toString(unique(b))}]'), .groups = 'drop')

-output

# A tibble: 3 x 3
# a c b
#* <chr> <dbl> <glue>
#1 A1 1 [a, b, c]
#2 A2 1 [d, e]
#3 A3 1 [f]

Python group by and count distinct values in a column and create delimited list

You can use str.len in your code:

df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)

output:

     company            product  count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2


Related Topics



Leave a reply



Submit