Get Rows of Unique Values by Group

get rows of unique values by group

data.table is a bit different in how to use duplicated. Here's the approach I've seen around here somewhere before:

dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 
setkey(dt, "y", "x")
key(dt)
# [1] "y" "x"
!duplicated(dt)
# [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
dt[!duplicated(dt)]
#    y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 1 6
# 4: b 2 5
# 5: b 3 4

how to group rows in an unique row for unique column values?

We could group by 'a', 'c', summarise the unique elements to 'b' in a string

library(dplyr)
df %>% 
   group_by(a, c) %>% 
   summarise(b = sprintf('[%s]', toString(unique(b))), .groups = 'drop') %>%
   select(names(df))

-output

# A tibble: 3 x 3
#  a     b             c
#  <chr> <chr>     <dbl>
#1 A1    [a, b, c]     1
#2 A2    [d, e]        1
#3 A3    [f]           1

Or if the 'c' values are also changing, use across

df %>%
  group_by(a) %>%
  summarise(across(everything(), ~ sprintf('[%s]', 
       toString(unique(.)))), .groups = 'drop')

Or if we need a list

df %>%
  group_by(a) %>%
  summarise(across(everything(), ~ list(unique(.))
         
    ), .groups = 'drop')

Or using glue

df %>%
   group_by(a, c) %>%
   summarise(b = glue::glue('[{toString(unique(b))}]'), .groups = 'drop')

-output

# A tibble: 3 x 3
#  a         c b        
#* <chr> <dbl> <glue>   
#1 A1        1 [a, b, c]
#2 A2        1 [d, e]   
#3 A3        1 [f]

Get the first row of each group of unique values in another column

Use groupby + first:

firsts = df.groupby('col_B', as_index=False).first()

Output:

>>> firsts
  col_B  col_A
0     x      1
1    xx      2
2     y      4

If the order of the columns is important:

firsts = df.loc[df.groupby('col_B', as_index=False).first().index]

Output:

>>> firsts
   col_A col_B
0      1     x
1      2    xx
2      3    xx

Get rows based on distinct values from one column

Use drop_duplicates with specifying column COL2 for check duplicates:

df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
    COL1  COL2
0  a.com    22
1  b.com    45
2  c.com    34
4  f.com    56

You can also keep only last values:

df = df.drop_duplicates('COL2', keep='last')
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56
5  g.com    22
6  h.com    45

Or remove all duplicates:

df = df.drop_duplicates('COL2', keep=False)
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56

pandas: how to select unique rows in group

Using .unique()

grouped_df['column_1'].unique()

or without unique you could do something like...

grouped_df['column_1'].apply(list).apply(set)

How to get unique values from multiple columns in a pandas groupby

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Count distinct values depending on group

You would use count(distinct):

select "group", count(distinct id)
from t
group by "group";

Note that group is a very poor name for a column because it is a SQL keyword. Hopefully the real column name is something more reasonable.

SQL - Select unique rows from a group of results

You want embedded queries, which not all SQLs support. In t-sql you'd have something like

select  r.registration, r.recent, t.id, t.unittype
from    ( 
    select  registration, max([date]) recent
    from    @tmp 
    group by 
        registration
    ) r
left outer join 
    @tmp t 
on  r.recent = t.[date]
and r.registration = t.registration

Get Rows of Unique Values by Group