How to find common rows between two dataframe in R?
common <- intersect(data.frame1$col, data.frame2$col)
data.frame1[common,] # give you common rows in data frame 1
data.frame2[common,] # give you common rows in data frame 2
How to find common rows between two dataframe in R and remove them
We may use anti_join
(originally posted as comments way before the other answer was posted)
library(dplyr)
anti_join(df1, df2, by = c("name1"))
data
df1 <- structure(list(name1 = c("a", "b", "c"), name2 = c("a1", "b1",
"c1"), name3 = c("a2", "b2", "c2")), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(name1 = c("a", "b"), name2 = c("a3", "b3")), class = "data.frame", row.names = c(NA,
-2L))
How to find common rows between two data frames?
You can use the following code:
c<- data.frame(A = c(4,6,7), B = c(5,9,8),C = c("T","T","F"))
d<- data.frame(A = c(6,7,3),B = c(9,8,3),C = c("T","F","F"))
merge(c, d, by= c("A", "B", "C"))
Output:
A B C
1 6 9 T
2 7 8 F
How can I find common rows between two dataframes based on two different numeric columns?
To my knowledge, merge
or join
would be the only way to compare two columns.
Using data.table
,
require(data.table)
#> Loading required package: data.table
df1 <- setDT(data.frame(name = c("DEC1", "PSA", "DEC2", "AKT"), START = c(9494957, 39689186, 89435677, 78484829), END = c(52521320, 114050940, 100952138, 78486308), STRAND = c("+", "+", "+", "-")))
df2 <- setDT(data.frame(name = c("DEC1", "PSA", "DEC2", "AKT"), START = c(9494557, 37689186, 89435677, 79484829), END = c(52521320, 114050940, 100952138, 78486308), STRAND = c("+", "+", "+", "-")))
df3 <- df1[df2[,.(name,START2=START, END2 = END)], on='name']
df3[abs(START2-START) %between% c(0,500) |
abs(START2-START) %between% c(0,500)]
#> name START END STRAND START2 END2
#> 1: DEC1 9494957 52521320 + 9494557 52521320
#> 2: DEC2 89435677 100952138 + 89435677 100952138
Or using dplyr
,
df3 <- inner_join(df1, df2, suffix=c('1','2'),by='name')
df3 %>% filter(abs(START2-START1)<500)
Created on 2022-04-30 by the reprex package (v2.0.1)
How to extract common rows between multiple dataframes
Here's an approach with inner_join
from dplyr
:
First, we join df1
and df2
, keeping only rows that are the same between them. This is called an inner join (thus the name of the function). By default, all columns that are named the same are joined. Thus, df1$V1
is joined to df2$V1
and df1$V2
is joined to df2$V2
. Next, we repeat the same process with the join of df1
and df2
with df3
.
Note that the pipe operator (%>%
) provides the output of the left hand side as the first argument as the right hand side.
library(dplyr)
inner_join(df1,df2) %>%
inner_join(df3)
# V1 V2
#1 a b
Also note that if the columns are named differently in the data.frame
s that you can explicitly define the relationship:
inner_join(df1,df2, by = c("V1" = "V1", "V2" = "V2"))
Finding common rows in 2 dataframes
There is probably better solutions but you can do :
d1 <- do.call("paste", df1)
d2 <- do.call("paste", df2)
> df1[d1%in%d2, ]
SURVEY_DATE DATA_COLLECTION_SITE
1 2012-07-01 Site 1
3 2012-08-10 Site 2
6 2012-09-20 Site 1
And for the final result :
> df3 <- df1[d1%in%d2, ]
> df4 <- df2[d2%in%d1, ]
>
> df3$FISHING_SITE <- NA
> df4$DATA_COLLECTION_SITE <- NA
>
> rbind(df3, df4)
SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
1 2012-07-01 Site 1 <NA>
3 2012-08-10 Site 2 <NA>
6 2012-09-20 Site 1 <NA>
11 2012-07-01 <NA> Site 1
31 2012-08-10 <NA> Site 2
61 2012-09-20 <NA> Site 1
Mark common rows between data frames in R
You can use %in%
to check for matches and is.na
to avoid matches with NA
.
df1$match <- df1$id %in% df2$id & !is.na(df1$id)
df1
# id match
#1 1 TRUE
#2 2 FALSE
#3 3 TRUE
#4 4 FALSE
#5 NA FALSE
#6 5 FALSE
#7 6 FALSE
#8 NA FALSE
#9 NA FALSE
#10 7 FALSE
#11 8 TRUE
Subset common rows from multiple data frames
I have multiple dataframes like mentioned below with unique id for each row. I am trying to find common rows and make a new dataframe which is appearing at least in two dataframes.
Since no ID appears twice in the same table, we can tabulate the IDs and keep any found twice:
library(data.table)
DTs = lapply(list(df1,df2,df3), data.table)
Id_keep = rbindlist(lapply(DTs, `[`, j = "Id"))[, .N, by=Id][N >= 2L, Id]
DT_keep = Reduce(funion, DTs)[Id %in% Id_keep]
# Id a b c
# 1: 2 1 0 0
# 2: 3 0 1 4
# 3: 5 9 1 7
Your data should be in an object like DTs
to begin with, not a bunch of separate named objects.
How it works
To get a sense of how it works, examine intermediate objects like
list(df1,df2,df3)
lapply(DTs, `[`, j = "Id")
Reduce(funion, DTs)
Also, read the help files, like ?lapply
, ?rbindlist
, ?funion
.
How to find row index of common rows between two matrices in R
match(do.call(paste, data.frame(a)), do.call(paste, data.frame(b)))
or even:
A <- data.frame(a)
B <- cbind(id = seq(nrow(b)), setNames(data.frame(b), names(A)))
merge(A, B, all.x = TRUE)
Finding common rows in R
You can use join_all
from plyr
package
require(plyr)
df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')
Related Topics
Asymmetric Expansion of Ggplot Axis Limits
R Fast Single Item Lookup from List VS Data.Table VS Hash
Filter a Vector of Strings Based on String Matching
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
Got Message Unable to Load Shared Object Stats.So When R Starts
Ggplot: Order Bars in Faceted Bar Chart Per Facet
Programmatically Insert Header and Plot in Same Code Chunk with R Markdown Using Results='Asis'
Count the Number of Unique Characters in a String
Get the Event Which Is Fired in Shiny
Find All Positions of All Matches of One Vector of Values in Second Vector
How to Extend Letters Past 26 Characters E.G., Aa, Ab, Ac...
Passing Parameters to R Markdown
Set Number of Columns (Or Rows) in a Facetted Plot
Grouping Every N Minutes with Dplyr
Cbind: How to Have Missing Values Set to Na
Convert List to Data Frame While Keeping List-Element Names
New R-Studio Version 0.98.932 Deletes .Md File - How to Prevent