Compare if two dataframe objects in R are equal?
It is not clear what it means to test if two data frames are "value equal" but to test if the values are the same, here is an example of two non-identical dataframes with equal values:
a <- data.frame(x = 1:10)
b <- data.frame(y = 1:10)
To test if all values are equal:
all(a == b) # TRUE
To test if objects are identical (they are not, they have different column names):
identical(a,b) # FALSE: class, colnames, rownames must all match.
How to check if two data frames are equal
Look up all.equal. It has some riders but it might work for you.
all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE
Compare 2 dataframes for equality in R
dplyr
's setdiff
works on data frames, I would suggest
library(dplyr)
nrow(setdiff(a, c)) == 0 & nrow(setdiff(c, a)) == 0
# [1] TRUE
Note that this will not account for number of duplicate rows. (i.e., if a
has multiple copies of a row, and c
has only one copy of that row, it will still return TRUE
). Not sure how you want duplicate rows handled...
If you do care about having the same number of duplicates, then I would suggest two possibilities: (a) adding an ID column to differentiate the duplicates and using the approach above, or (b) sorting, resetting the row names (annoyingly), and using identical
.
(a) adding an ID column
library(dplyr)
a_id = group_by_all(a) %>% mutate(id = row_number())
c_id = group_by_all(c) %>% mutate(id = row_number())
nrow(setdiff(a_id, c_id)) == 0 & nrow(setdiff(c_id, a_id)) == 0
# [1] TRUE
(b) sorting
a_sort = a[do.call(order, a), ]
row.names(a_sort) = NULL
c_sort = c[do.call(order, c), ]
row.names(c_sort) = NULL
identical(a_sort, c_sort)
# [1] TRUE
Compare two dataframes in R
One approach could be to convert data to long format, perform an inner_join
subtract values, check if all the values are in range and get the data back in wide format.
library(dplyr)
library(tidyr)
df1 %>% pivot_longer(cols = -Letter) %>%
inner_join(df2 %>% pivot_longer(cols = -Letter), by = c("Letter", "name")) %>%
mutate(value = value.x - value.y) %>%
group_by(Letter) %>%
mutate(check = all(between(value, 0, 2))) %>%
select(-value.x, -value.y) %>%
pivot_wider()
# Letter check `2011` `2012` `2013`
# <chr> <lgl> <int> <int> <int>
#1 A TRUE 1 2 1
#2 C FALSE 0 -1 3
data
df1 <- structure(list(Letter = c("A", "B", "C"), `2011` = c(2L, 6L,5L),
`2012` = c(3L, 6L, 4L), `2013` = c(5L, 6L, 8L)), row.names = c(NA, -3L),
class = "data.frame")
df2 <- structure(list(Letter = c("A", "C"), `2011` = c(1L, 5L), `2012` = c(1L,
5L), `2013` = 4:5), row.names = c(NA, -2L), class = "data.frame")
Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2
This doesn't answer your question directly, but it will give you the elements that are in common. This can be done with Paul Murrell's package compare
:
library(compare)
a1 <- data.frame(a = 1:5, b = letters[1:5])
a2 <- data.frame(a = 1:3, b = letters[1:3])
comparison <- compare(a1,a2,allowAll=TRUE)
comparison$tM
# a b
#1 1 a
#2 2 b
#3 3 c
The function compare
gives you a lot of flexibility in terms of what kind of comparisons are allowed (e.g. changing order of elements of each vector, changing order and names of variables, shortening variables, changing case of strings). From this, you should be able to figure out what was missing from one or the other. For example (this is not very elegant):
difference <-
data.frame(lapply(1:ncol(a1),function(i)setdiff(a1[,i],comparison$tM[,i])))
colnames(difference) <- colnames(a1)
difference
# a b
#1 4 d
#2 5 e
How to compare every row of dataframe to dataframe in R?
You can do this: in short, with each row of the dataframe, duplicate it to create a new dataframe with all values changed to that row, and compare that dataframe with the original (whether the values are the same). rowSums
of each of that comparison will give you the vectors you want.
# Create the desired output in list
lst <-
lapply(1:nrow(df), function(nr) {
rowSums(replicate(nrow(df), df[nr, ], simplify = FALSE) %>%
do.call("rbind", .) == df)})
# To create the desired dataframe
df %>% tibble(desired_column = I(lst))
In tibble
call in the last row, I()
is used to put in list output as a column.
Related Topics
How to Change the Background Color of the Shiny Dashboard Body
Writing Functions VS. Line-By-Line Interpretation in an R Workflow
How to Add Rmse, Slope, Intercept, R^2 to R Plot
Manipulating Multiple Files in R
How Exactly Does R Parse '->', the Right-Assignment Operator
R: Count Unique Values by Category
How to Create a Time-Spiral Graph Using R
How to Change the Name of a Data Frame
R: Cumulative Sum Over Rolling Date Range
Shiny: Plot Results in Popup Window
Print Number as Reduced Fraction in R
Force No Default Selection in Selectinput()
Why Can't I Get a P-Value Smaller Than 2.2E-16
Get Selected Row from Datatable in Shiny App