Get Rows of Unique Values by Group

get rows of unique values by group

data.table is a bit different in how to use duplicated. Here's the approach I've seen around here somewhere before:

dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 
setkey(dt, "y", "x")
key(dt)
# [1] "y" "x"
!duplicated(dt)
# [1] TRUE TRUE FALSE TRUE TRUE TRUE
dt[!duplicated(dt)]
# y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 1 6
# 4: b 2 5
# 5: b 3 4

how to group rows in an unique row for unique column values?

We could group by 'a', 'c', summarise the unique elements to 'b' in a string

library(dplyr)
df %>%
group_by(a, c) %>%
summarise(b = sprintf('[%s]', toString(unique(b))), .groups = 'drop') %>%
select(names(df))

-output

# A tibble: 3 x 3
# a b c
# <chr> <chr> <dbl>
#1 A1 [a, b, c] 1
#2 A2 [d, e] 1
#3 A3 [f] 1

Or if the 'c' values are also changing, use across

df %>%
group_by(a) %>%
summarise(across(everything(), ~ sprintf('[%s]',
toString(unique(.)))), .groups = 'drop')

Or if we need a list

df %>%
group_by(a) %>%
summarise(across(everything(), ~ list(unique(.))

), .groups = 'drop')

Or using glue

df %>%
group_by(a, c) %>%
summarise(b = glue::glue('[{toString(unique(b))}]'), .groups = 'drop')

-output

# A tibble: 3 x 3
# a c b
#* <chr> <dbl> <glue>
#1 A1 1 [a, b, c]
#2 A2 1 [d, e]
#3 A3 1 [f]

Get the first row of each group of unique values in another column

Use groupby + first:

firsts = df.groupby('col_B', as_index=False).first()

Output:

>>> firsts
col_B col_A
0 x 1
1 xx 2
2 y 4

If the order of the columns is important:

firsts = df.loc[df.groupby('col_B', as_index=False).first().index]

Output:

>>> firsts
col_A col_B
0 1 x
1 2 xx
2 3 xx

Get rows based on distinct values from one column

Use drop_duplicates with specifying column COL2 for check duplicates:

df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
COL1 COL2
0 a.com 22
1 b.com 45
2 c.com 34
4 f.com 56

You can also keep only last values:

df = df.drop_duplicates('COL2', keep='last')
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
5 g.com 22
6 h.com 45

Or remove all duplicates:

df = df.drop_duplicates('COL2', keep=False)
print (df)
COL1 COL2
2 c.com 34
4 f.com 56

pandas: how to select unique rows in group

Using .unique()

grouped_df['column_1'].unique()

or without unique you could do something like...

grouped_df['column_1'].apply(list).apply(set)

How to get unique values from multiple columns in a pandas groupby

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Count distinct values depending on group

You would use count(distinct):

select "group", count(distinct id)
from t
group by "group";

Note that group is a very poor name for a column because it is a SQL keyword. Hopefully the real column name is something more reasonable.

SQL - Select unique rows from a group of results

You want embedded queries, which not all SQLs support. In t-sql you'd have something like

select  r.registration, r.recent, t.id, t.unittype
from (
select registration, max([date]) recent
from @tmp
group by
registration
) r
left outer join
@tmp t
on r.recent = t.[date]
and r.registration = t.registration


Related Topics



Leave a reply



Submit