Find Duplicated Rows (Based on 2 Columns) in Data Frame in R

Find duplicated rows (based on 2 columns) in Data Frame in R

You can always try simply passing those first two columns to the function duplicated:

duplicated(dat[,1:2])

assuming your data frame is called dat. For more information, we can consult the help files for the duplicated function by typing ?duplicated at the console. This will provide the following sentences:

Determines which elements of a vector or data frame are duplicates of
elements with smaller subscripts, and returns a logical vector
indicating which elements (rows) are duplicates.

So duplicated returns a logical vector, which we can then use to extract a subset of dat:

ind <- duplicated(dat[,1:2])
dat[ind,]

or you can skip the separate assignment step and simply use:

dat[duplicated(dat[,1:2]),]

R - find and list duplicate rows based on two columns

Here is an option using duplicated twice, second time along with fromLast = TRUE option because it returns TRUE only from the duplicate value on-wards

dupe = data[,c('T.N','ID')] # select columns to check duplicates
data[duplicated(dupe) | duplicated(dupe, fromLast=TRUE),]

#     File T.N ID Col1 Col2
#1 BAI.txt   T  1 sdaf eiri
#3 BBK.txt   T  1  ter  ase

Find duplicate rows in data frame based on multiple columns in r

We can do

library(data.table)
unique(setDT(data_concern_join2), 
       by = c('locid', 'stdate', 'sttime', 'charnam', 'valunit'))

Find duplicate rows based on 2 columns and keep rows based on the value of a 3rd column in R

You can do:

library(tidyverse)

df %>%
  group_by(id_number, date) %>%
  filter(!(result == 9 & row_number() > 1)) %>%
  ungroup()

# A tibble: 6 x 3
  id_number date       result
      <dbl> <chr>       <dbl>
1         1 2021-11-03      0
2         1 2021-11-19      1
3         2 2021-11-11      0
4         3 2021-11-05      0
5         3 2021-11-16      0
6         4 2021-11-29      9

remove duplicate values based on 2 columns

This will give you the desired result:

df [!duplicated(df[c(1,4)]),]

How to find duplicates based on values in 2 columns but also the groupings by another column in R?

It was a little unclear if you wanted to return:

only the distinct rows
single examples of duplicated rows
all duplicated rows

So here are some options:

library(dplyr)
library(readr)

"ID,a,b
 1, 1, 1
 1, 1, 1
 1, 1, 2
 2, 1, 1
 2, 1, 2" %>% 
  read_csv() -> exp_dat

# return only distinct rows
exp_dat %>% 
  distinct(ID, a, b)

# # A tibble: 4 x 3
#      ID     a     b
#   <dbl> <dbl> <dbl>
# 1     1     1     1
# 2     1     1     2
# 3     2     1     1
# 4     2     1     2

# return single examples of duplicated rows
exp_dat %>% 
  group_by(ID, a, b) %>% 
  count() %>% 
  filter(n > 1) %>% 
  ungroup() %>% 
  select(-n)

# # A tibble: 1 x 3
#      ID     a     b
#   <dbl> <dbl> <dbl>
# 1     1     1     1

# return all duplicated rows
exp_dat %>% 
  group_by(ID, a, b) %>% 
  add_count() %>% 
  filter(n > 1) %>% 
  ungroup() %>% 
  select(-n)

# # A tibble: 2 x 3
#      ID     a     b
#   <dbl> <dbl> <dbl>
# 1     1     1     1
# 2     1     1     1

Use R to find duplicates in multiple columns at once

We can use unique with by option from data.table

library(data.table)
unique(setDT(df), by = c("Surname", "Address"))
#    Surname First Name Address
#1:      A1      Bobby      X1
#2:      B5        Joe      X2
#3:      B5       Mary      X3
#4:      F2        Lou      X4
#5:      F3      Sarah      X5
#6:      G4      Bobby      X6
#7:      H5       Eric      X7
#8:      K6      Peter      X8

Or with tidyverse

library(dplyr)
df %>% 
  distinct(Surname, Address, .keep_all = TRUE)
# Surname First Name Address
#1      A1      Bobby      X1
#2      B5        Joe      X2
#3      B5       Mary      X3
#4      F2        Lou      X4
#5      F3      Sarah      X5
#6      G4      Bobby      X6
#7      H5       Eric      X7
#8      K6      Peter      X8

Update

Based on the updated post, perhaps this helps

setDT(df)[, if((uniqueN(FirstName))>1) .SD,.(Surname, Address)]
#   Surname Address FirstName
#1:      G4      X6     Bobby
#2:      G4      X6      Fred
#3:      G4      X6      Anna

Remove duplicated rows based on 2 columns in R

For the sake of completeness, the unique() function from the data.table package can be used as well:

library(data.table)
unique(setDT(df), by = "IndexA")

   TimeStamp IndexA IndexB     Value
1:  12:00:01      1     NA   Windows
2:  12:00:48     NA      1 Macintosh
3:  12:02:01      2     NA   Windows

This is looking for unique values only in IndexA which is equivalent to Tito Sanz' answer. Obviously, this approach returns the expected result for the given sample data set but checking only one column for duplicate entries is oversimplifying IMHO and may fail with production data.

Or, looking for unique combinations of the values in three columns (which is equivalent to www's answer):

unique(setDT(df), by = 2:4) # very terse
unique(setDT(df), by = c("IndexA", "IndexB", "Value")) # explicitely named cols

   TimeStamp IndexA IndexB     Value
1:  12:00:01      1     NA   Windows
2:  12:00:48     NA      1 Macintosh
3:  12:02:01      2     NA   Windows

Data

library(data.table)
df <- fread(
  "TimeStamp  IndexA IndexB Value
12:00:01    1      NA    Windows
12:00:05    1      NA    Windows
12:00:13    1      NA    Windows
12:00:48    NA     1     Macintosh
12:01:30    NA     1     Macintosh
12:01:45    NA     1     Macintosh
12:02:01    2      NA    Windows
12:02:13    2      NA    Windows")

r filter duplicate rows based on value in column

Here is an option

df %>%
    group_by(Id) %>%
    filter(Col3 == "A" | n() == 1) %>%
    ungroup()
## A tibble: 3 x 5
#     Id Date       Col1  Col2 Col3 
#  <int> <chr>     <int> <int> <chr>
#1     1 1/1/1995     NA     1 A    
#2     2 3/10/1992     0     1 B    
#3     3 8/15/2002     1     1 B

This keeps either rows where Col3 == "A" or single rows per group. PS. I recommend always using ungroup() to avoid unwanted surprises downstream.

Find Duplicated Rows (Based on 2 Columns) in Data Frame in R