How to Find Common Rows Between Two Dataframe in R

How to find common rows between two dataframe in R?

common <- intersect(data.frame1$col, data.frame2$col)  
data.frame1[common,] # give you common rows in data frame 1
data.frame2[common,] # give you common rows in data frame 2

How to find common rows between two dataframe in R and remove them

We may use anti_join (originally posted as comments way before the other answer was posted)

library(dplyr)
anti_join(df1, df2, by = c("name1"))

data

df1 <- structure(list(name1 = c("a", "b", "c"), name2 = c("a1", "b1", 
"c1"), name3 = c("a2", "b2", "c2")), class = "data.frame", row.names = c(NA,
-3L))

df2 <- structure(list(name1 = c("a", "b"), name2 = c("a3", "b3")), class = "data.frame", row.names = c(NA,
-2L))

How to find common rows between two data frames?

You can use the following code:

c<- data.frame(A = c(4,6,7), B = c(5,9,8),C = c("T","T","F"))
d<- data.frame(A = c(6,7,3),B = c(9,8,3),C = c("T","F","F"))

merge(c, d, by= c("A", "B", "C"))

Output:

  A B C
1 6 9 T
2 7 8 F

How can I find common rows between two dataframes based on two different numeric columns?

To my knowledge, merge or join would be the only way to compare two columns.

Using data.table,

require(data.table)
#> Loading required package: data.table
df1 <- setDT(data.frame(name = c("DEC1", "PSA", "DEC2", "AKT"), START = c(9494957, 39689186, 89435677, 78484829), END = c(52521320, 114050940, 100952138, 78486308), STRAND = c("+", "+", "+", "-")))
df2 <- setDT(data.frame(name = c("DEC1", "PSA", "DEC2", "AKT"), START = c(9494557, 37689186, 89435677, 79484829), END = c(52521320, 114050940, 100952138, 78486308), STRAND = c("+", "+", "+", "-")))
df3 <- df1[df2[,.(name,START2=START, END2 = END)], on='name']

df3[abs(START2-START) %between% c(0,500) |
abs(START2-START) %between% c(0,500)]
#>    name    START       END STRAND   START2      END2
#> 1: DEC1 9494957 52521320 + 9494557 52521320
#> 2: DEC2 89435677 100952138 + 89435677 100952138

Or using dplyr,

df3 <- inner_join(df1, df2, suffix=c('1','2'),by='name')
df3 %>% filter(abs(START2-START1)<500)

Created on 2022-04-30 by the reprex package (v2.0.1)

How to extract common rows between multiple dataframes

Here's an approach with inner_join from dplyr:

First, we join df1 and df2, keeping only rows that are the same between them. This is called an inner join (thus the name of the function). By default, all columns that are named the same are joined. Thus, df1$V1 is joined to df2$V1 and df1$V2 is joined to df2$V2. Next, we repeat the same process with the join of df1 and df2 with df3.

Note that the pipe operator (%>%) provides the output of the left hand side as the first argument as the right hand side.

library(dplyr)
inner_join(df1,df2) %>%
inner_join(df3)
# V1 V2
#1 a b

Also note that if the columns are named differently in the data.frames that you can explicitly define the relationship:

inner_join(df1,df2, by = c("V1" = "V1", "V2" = "V2"))

Finding common rows in 2 dataframes

There is probably better solutions but you can do :

d1 <- do.call("paste", df1)
d2 <- do.call("paste", df2)
> df1[d1%in%d2, ]
SURVEY_DATE DATA_COLLECTION_SITE
1 2012-07-01 Site 1
3 2012-08-10 Site 2
6 2012-09-20 Site 1

And for the final result :

> df3 <- df1[d1%in%d2, ]
> df4 <- df2[d2%in%d1, ]
>
> df3$FISHING_SITE <- NA
> df4$DATA_COLLECTION_SITE <- NA
>
> rbind(df3, df4)
SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
1 2012-07-01 Site 1 <NA>
3 2012-08-10 Site 2 <NA>
6 2012-09-20 Site 1 <NA>
11 2012-07-01 <NA> Site 1
31 2012-08-10 <NA> Site 2
61 2012-09-20 <NA> Site 1

Mark common rows between data frames in R

You can use %in% to check for matches and is.na to avoid matches with NA.

df1$match <- df1$id %in% df2$id & !is.na(df1$id)
df1

# id match
#1 1 TRUE
#2 2 FALSE
#3 3 TRUE
#4 4 FALSE
#5 NA FALSE
#6 5 FALSE
#7 6 FALSE
#8 NA FALSE
#9 NA FALSE
#10 7 FALSE
#11 8 TRUE

Subset common rows from multiple data frames

I have multiple dataframes like mentioned below with unique id for each row. I am trying to find common rows and make a new dataframe which is appearing at least in two dataframes.

Since no ID appears twice in the same table, we can tabulate the IDs and keep any found twice:

library(data.table)

DTs = lapply(list(df1,df2,df3), data.table)

Id_keep = rbindlist(lapply(DTs, `[`, j = "Id"))[, .N, by=Id][N >= 2L, Id]

DT_keep = Reduce(funion, DTs)[Id %in% Id_keep]

# Id a b c
# 1: 2 1 0 0
# 2: 3 0 1 4
# 3: 5 9 1 7

Your data should be in an object like DTs to begin with, not a bunch of separate named objects.

How it works

To get a sense of how it works, examine intermediate objects like

  • list(df1,df2,df3)
  • lapply(DTs, `[`, j = "Id")
  • Reduce(funion, DTs)

Also, read the help files, like ?lapply, ?rbindlist, ?funion.

How to find row index of common rows between two matrices in R

match(do.call(paste, data.frame(a)), do.call(paste, data.frame(b)))

or even:

A <- data.frame(a)
B <- cbind(id = seq(nrow(b)), setNames(data.frame(b), names(A)))
merge(A, B, all.x = TRUE)

Finding common rows in R

You can use join_all from plyr package

require(plyr)
df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')


Related Topics



Leave a reply



Submit