Simultaneously merge multiple data.frames in a list
Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below:
x <- data.frame(i = c("a","b","c"), j = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, stringsAsFactors=FALSE)
Update June 2018: I divided the answer in three sections representing three different ways to perform the merge. You probably want to use the purrr
way if you are already using the tidyverse packages. For comparison purposes below, you'll find a base R version using the same sample dataset.
1) Join them with reduce
from the purrr
package:
The purrr
package provides a reduce
function which has a concise syntax:
library(tidyverse)
list(x, y, z) %>% reduce(left_join, by = "i")
# A tibble: 3 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
You can also perform other joins, such as a full_join
or inner_join
:
list(x, y, z) %>% reduce(full_join, by = "i")
# A tibble: 4 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
# 4 d NA 6 8
list(x, y, z) %>% reduce(inner_join, by = "i")
# A tibble: 1 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 c 3 5 7
2) dplyr::left_join()
with base R Reduce()
:
list(x,y,z) %>%
Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="i"), .)
# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
3) Base R merge()
with base R Reduce()
:
And for comparison purposes, here is a base R version of the left join based on Charles's answer.
Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
list(x,y,z))
# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
R - Merge list of three dataframes into single dataframe with ID in first column, next three columns show values
What about something like this using dplyr
/purrr
:
require(tidyverse);
reduce(lst, full_join, by = "ID");
# ID Value.x Value.y Value
# 1 A 1 1 NA
# 2 B 1 NA 1
# 3 C 1 NA 1
# 4 D NA 1 NA
# 5 E NA 1 NA
Or with the NA
s replaced with 0
s:
reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
# ID Value.x Value.y Value
#1 A 1 1 0
#2 B 1 0 1
#3 C 1 0 1
#4 D 0 1 0
#5 E 0 1 0
Sample data
options(stringsAsFactors = FALSE);
lst <- list(
data.frame(ID = c("A", "B", "C"), Value = c(1, 1, 1)),
data.frame(ID = c("A", "D", "E"), Value = c(1, 1, 1)),
data.frame(ID = c("B", "C"), Value = c(1, 1)))
How to join multiple data.frames with different numbers of rows at once in R?
If inner_join does the trick for a pair of dataframes, we can use Reduce do apply inner_join to all data.frames.
I would first put all the data.frames in a list (here with mget(ls())
), then call Reduce
with inner_join
as the function to be reduced
library(dplyr)
Reduce(inner_join, mget(ls(pattern='df\\d+')))
I prefer using tidyverse for everything, so I usually use purrr::reduce here:
library(dplyr)
library(purrr)
reduce(mget(ls(pattern='df\\d+')), inner_join)
As an alternative to name matchin with ls(pattern=)), we can use a function to select all dataframes from your global environment. We can use Filter() or purrr::keep:
library(purrr)
library(dplyr)
mget(ls()) %>% keep(is.data.frame) %>% reduce(inner_join)
How to merge multiple data.frames with Reduce and get an ordered output?
This happens due to sort
of merge
:
sort - logical. Should the result be sorted on the by columns?
So, instead you may use
Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA
Loop for merging multiple dataframes from list of dataframes in R
I'm not a fan of how this ends up with multiple columns with the same name, but that's what you wanted.
You aren't really asking for a merge because that would give 3 x 3 = 9 rows, so I used cbind.
(I changed the name of the list of data.frames to df_list to avoid confusion)
df_list <- list(
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 2, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y'))
)
for (i in 1:(length(df_list) - 1)) {
if (NROW(df_list[[i]]) == NROW(df_list[[i + 1]]) &&
all(df_list[[i]]$ID == df_list[[i + 1]]$ID)) {
df_list[[i]] <- cbind(df_list[[i]], df_list[[i + 1]][, -1])
df_list[[i + 1]] <- list()
}
}
df_list <- df_list[!sapply(df_list, function(x) NROW(x) == 0)]
df_list
[[1]]
ID b c d b c d
1 1 x y z x y z
2 1 y z x y z x
3 1 z x y z x y
[[2]]
ID b c d
1 2 x y z
2 2 y z x
3 2 z x y
Merging a data frame from a list of data frames
We could also use Reduce
Reduce(function(...) merge(..., by = c('month', 'year')), lst)
Using @Jaap's example, if the values are not the same, use all=TRUE
option from merge
.
Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
# month year oracle microsoft google
#1 1 2004 356.0000 NA NA
#2 2 2004 390.0000 339.0000 NA
#3 3 2004 394.4286 357.7143 390.0000
#4 4 2004 391.8571 347.1429 391.8571
#5 5 2004 NA 333.2857 357.7143
#6 6 2004 NA NA 333.2857
merge multiple data.frames [r]
What about something like this:
l2 <- Reduce(function(x, n) merge(x, l1[[n]], by='nu_pregao', suffixes = c("", n)),
seq(2, length(l1)), init = l1[[1]])
l2
#> nu_pregao pcVar pcVar2 pcVar3
#> 1 2371 7.224848 4.055709 4.011461
#> 2 2372 2.797704 2.944882 3.679907
#> 3 2373 3.947368 3.507937 4.693034
Final touch for names consistency:
names(l2)[match("pcVar", names(l2))] <- "pcVar1"
l2
#> nu_pregao pcVar1 pcVar2 pcVar3
#> 1 2371 7.224848 4.055709 4.011461
#> 2 2372 2.797704 2.944882 3.679907
#> 3 2373 3.947368 3.507937 4.693034
Your data:
l1 <- list(read.table(text = "nu_pregao pcVar
1 2371 7.224848
45 2372 2.797704
89 2373 3.947368", header = TRUE),
read.table(text = "nu_pregao pcVar
2 2371 4.055709
46 2372 2.944882
90 2373 3.507937", header = TRUE),
read.table(text = "nu_pregao pcVar
3 2371 4.011461
47 2372 3.679907
91 2373 4.693034", header = TRUE))
Merge R data frames with differing lengths
We can put all dataframes into a list and use Reduce
and merge
from base R:
df <- Reduce(function(...) merge(..., by='ID', all.x=TRUE), list(parent, df1, df2))
df[is.na(df)] <- 0
ID Number1 Number2
1 Ants 6 5
2 Cow 0 7
3 Dog 2 0
4 Hen 7 0
5 Tiger 0 3
Or we can use join_all
from plyr
:
library(plyr)
join_all(list(parent, df1, df2), by='ID', type='left') %>%
replace(is.na(.), 0)
Or with purrr::reduce
:
library(tidyverse)
reduce(list(parent, df1, df2), left_join, by = 'ID') %>%
mutate(across(where(is.numeric), ~ replace_na(.x, 0)))
Data
parent <- structure(list(ID = c("Ants", "Cow", "Dog", "Hen", "Tiger")),
class = "data.frame", row.names = c(NA, -5L))
df1 <- structure(list(ID = c("Ants", "Dog", "Hen"),
Number1 = c(6L, 2L, 7L)),
class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(ID = c("Ants", "Cow", "Tiger"),
Number2 = c(5L, 7L, 3L)),
class = "data.frame", row.names = c(NA, -3L))
Related Topics
Calculate Max Value Across Multiple Columns by Multiple Groups
Remove Ids With Fewer Than 9 Unique Observations
Change the Class from Factor to Numeric of Many Columns in a Data Frame
How Does the 'Prop.Table()' Function Work in R
Replace Missing Values (Na) With Most Recent Non-Na by Group
Why Is It Not Advisable to Use Attach() in R, and What Should I Use Instead
Replace Na With Previous or Next Value, by Group, Using Dplyr
Is There an R Function For Finding the Index of an Element in a Vector
Replace a Value in a Data Frame Based on a Conditional ('If') Statement
Axis Labels on Two Lines With Nested X Variables (Year Below Months)
R: Pulling Data from One Column to Create New Columns
R: Error in Usemethod("Tbl_Vars")
Multiplying All Columns in Dataframe by Single Column
I Want to Split Street Address into Two Columns. One With Street Number Other With Street Name
Strptime, As.Posixct and As.Date Return Unexpected Na