Duplicates in Multiple Columns

duplicates in multiple columns

It works if you use duplicated twice:

df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ]

a b c d
1 1 2 A 1001
4 4 8 C 1003
7 7 13 E 1005
8 8 14 E 1006

How do I find duplicates across multiple columns?

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city

Grouping by multiple columns to find duplicate rows pandas

You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:

df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
id val1 val2
0 1 1.1 2.2
1 1 1.1 2.2
3 3 8.8 6.2
4 4 1.1 2.2
5 5 8.8 6.2

Detail:

print (df.duplicated(subset=['val1','val2'], keep=False))
0 True
1 True
2 False
3 True
4 True
5 True
dtype: bool

remove duplicate values based on 2 columns

This will give you the desired result:

df [!duplicated(df[c(1,4)]),]

Cross comparison of duplicates in multiple columns using COUNTIF / SUMPRODUCT

You need to use this formula for the column header named Duplicate?

This gives you the count where it is more than once, and wrapping it up within an IF Logic to check if its TRUE to return Yes otherwise No

Formula used in cell C2 & Fill Down

=IF(COUNTIFS($A:$A,$A:$A,$B:$B,$B:$B)>1,"Yes","No")

Sample Image

Identify duplicate based on multiple columns (may include multiple values) and return Boolean if identified duplicated in python

Let us do

df['New'] = df.assign(produce=df['produce'].str.split(', ')).\
explode('produce').\
duplicated(subset=['store', 'station', 'produce'], keep=False).any(level=0)

Out[160]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 False
9 True
10 True
11 False
dtype: bool

Use R to find duplicates in multiple columns at once

We can use unique with by option from data.table

library(data.table)
unique(setDT(df), by = c("Surname", "Address"))
# Surname First Name Address
#1: A1 Bobby X1
#2: B5 Joe X2
#3: B5 Mary X3
#4: F2 Lou X4
#5: F3 Sarah X5
#6: G4 Bobby X6
#7: H5 Eric X7
#8: K6 Peter X8

Or with tidyverse

library(dplyr)
df %>%
distinct(Surname, Address, .keep_all = TRUE)
# Surname First Name Address
#1 A1 Bobby X1
#2 B5 Joe X2
#3 B5 Mary X3
#4 F2 Lou X4
#5 F3 Sarah X5
#6 G4 Bobby X6
#7 H5 Eric X7
#8 K6 Peter X8

Update

Based on the updated post, perhaps this helps

setDT(df)[, if((uniqueN(FirstName))>1) .SD,.(Surname, Address)]
# Surname Address FirstName
#1: G4 X6 Bobby
#2: G4 X6 Fred
#3: G4 X6 Anna

Remove Duplicates in Table Based on Multiple Column

To combine the two columns, you have to capture BOTH sets of the data as an array. This applies to removing duplicates on any data set range or table, as well as if you want to Filter on multiple members.

In your case since you want the second and third columns in your table evaluated, you can easily rewrite your code as:

Sheets("A").ListObjects("Data").Range.RemoveDuplicates Columns:=Array(2,3), Header:=xlYes



Related Topics



Leave a reply



Submit