Duplicates in Multiple Columns

duplicates in multiple columns

It works if you use duplicated twice:

df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ]

  a  b c    d
1 1  2 A 1001
4 4  8 C 1003
7 7 13 E 1005
8 8 14 E 1006

How do I find duplicates across multiple columns?

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

Grouping by multiple columns to find duplicate rows pandas

You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:

df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
   id  val1  val2
0   1   1.1   2.2
1   1   1.1   2.2
3   3   8.8   6.2
4   4   1.1   2.2
5   5   8.8   6.2

Detail:

print (df.duplicated(subset=['val1','val2'], keep=False))
0     True
1     True
2    False
3     True
4     True
5     True
dtype: bool

remove duplicate values based on 2 columns

This will give you the desired result:

df [!duplicated(df[c(1,4)]),]

Cross comparison of duplicates in multiple columns using COUNTIF / SUMPRODUCT

You need to use this formula for the column header named Duplicate?

This gives you the count where it is more than once, and wrapping it up within an IF Logic to check if its TRUE to return Yes otherwise No

Formula used in cell C2 & Fill Down

=IF(COUNTIFS($A:$A,$A:$A,$B:$B,$B:$B)>1,"Yes","No")

Sample Image

Identify duplicate based on multiple columns (may include multiple values) and return Boolean if identified duplicated in python

Let us do

df['New'] = df.assign(produce=df['produce'].str.split(', ')).\
               explode('produce').\
               duplicated(subset=['store', 'station', 'produce'], keep=False).any(level=0)

Out[160]: 
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
11    False
dtype: bool

Use R to find duplicates in multiple columns at once

We can use unique with by option from data.table

library(data.table)
unique(setDT(df), by = c("Surname", "Address"))
#    Surname First Name Address
#1:      A1      Bobby      X1
#2:      B5        Joe      X2
#3:      B5       Mary      X3
#4:      F2        Lou      X4
#5:      F3      Sarah      X5
#6:      G4      Bobby      X6
#7:      H5       Eric      X7
#8:      K6      Peter      X8

Or with tidyverse

library(dplyr)
df %>% 
  distinct(Surname, Address, .keep_all = TRUE)
# Surname First Name Address
#1      A1      Bobby      X1
#2      B5        Joe      X2
#3      B5       Mary      X3
#4      F2        Lou      X4
#5      F3      Sarah      X5
#6      G4      Bobby      X6
#7      H5       Eric      X7
#8      K6      Peter      X8

Update

Based on the updated post, perhaps this helps

setDT(df)[, if((uniqueN(FirstName))>1) .SD,.(Surname, Address)]
#   Surname Address FirstName
#1:      G4      X6     Bobby
#2:      G4      X6      Fred
#3:      G4      X6      Anna

Remove Duplicates in Table Based on Multiple Column

To combine the two columns, you have to capture BOTH sets of the data as an array. This applies to removing duplicates on any data set range or table, as well as if you want to Filter on multiple members.

In your case since you want the second and third columns in your table evaluated, you can easily rewrite your code as:

Sheets("A").ListObjects("Data").Range.RemoveDuplicates Columns:=Array(2,3), Header:=xlYes

Duplicates in Multiple Columns