Simultaneously Merge Multiple Data.Frames in a List

Simultaneously merge multiple data.frames in a list

Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below:

x <- data.frame(i = c("a","b","c"), j = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, stringsAsFactors=FALSE)

Update June 2018: I divided the answer in three sections representing three different ways to perform the merge. You probably want to use the purrr way if you are already using the tidyverse packages. For comparison purposes below, you'll find a base R version using the same sample dataset.


1) Join them with reduce from the purrr package:

The purrr package provides a reduce function which has a concise syntax:

library(tidyverse)
list(x, y, z) %>% reduce(left_join, by = "i")
# A tibble: 3 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7

You can also perform other joins, such as a full_join or inner_join:

list(x, y, z) %>% reduce(full_join, by = "i")
# A tibble: 4 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
# 4 d NA 6 8

list(x, y, z) %>% reduce(inner_join, by = "i")
# A tibble: 1 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 c 3 5 7

2) dplyr::left_join() with base R Reduce():

list(x,y,z) %>%
Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="i"), .)

# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7

3) Base R merge() with base R Reduce():

And for comparison purposes, here is a base R version of the left join based on Charles's answer.

 Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
list(x,y,z))
# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7

R - Merge list of three dataframes into single dataframe with ID in first column, next three columns show values

What about something like this using dplyr/purrr:

require(tidyverse);
reduce(lst, full_join, by = "ID");
# ID Value.x Value.y Value
# 1 A 1 1 NA
# 2 B 1 NA 1
# 3 C 1 NA 1
# 4 D NA 1 NA
# 5 E NA 1 NA

Or with the NAs replaced with 0s:

reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
# ID Value.x Value.y Value
#1 A 1 1 0
#2 B 1 0 1
#3 C 1 0 1
#4 D 0 1 0
#5 E 0 1 0

Sample data

options(stringsAsFactors = FALSE);
lst <- list(
data.frame(ID = c("A", "B", "C"), Value = c(1, 1, 1)),
data.frame(ID = c("A", "D", "E"), Value = c(1, 1, 1)),
data.frame(ID = c("B", "C"), Value = c(1, 1)))

How to join multiple data.frames with different numbers of rows at once in R?

If inner_join does the trick for a pair of dataframes, we can use Reduce do apply inner_join to all data.frames.
I would first put all the data.frames in a list (here with mget(ls())), then call Reduce with inner_join as the function to be reduced

library(dplyr)

Reduce(inner_join, mget(ls(pattern='df\\d+')))

I prefer using tidyverse for everything, so I usually use purrr::reduce here:

library(dplyr)
library(purrr)

reduce(mget(ls(pattern='df\\d+')), inner_join)

As an alternative to name matchin with ls(pattern=)), we can use a function to select all dataframes from your global environment. We can use Filter() or purrr::keep:

library(purrr)
library(dplyr)
mget(ls()) %>% keep(is.data.frame) %>% reduce(inner_join)

How to merge multiple data.frames with Reduce and get an ordered output?

This happens due to sort of merge:

sort - logical. Should the result be sorted on the by columns?

So, instead you may use

Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA

Loop for merging multiple dataframes from list of dataframes in R

I'm not a fan of how this ends up with multiple columns with the same name, but that's what you wanted.

You aren't really asking for a merge because that would give 3 x 3 = 9 rows, so I used cbind.

(I changed the name of the list of data.frames to df_list to avoid confusion)

df_list <- list(
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 2, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y'))
)

for (i in 1:(length(df_list) - 1)) {
if (NROW(df_list[[i]]) == NROW(df_list[[i + 1]]) &&
all(df_list[[i]]$ID == df_list[[i + 1]]$ID)) {
df_list[[i]] <- cbind(df_list[[i]], df_list[[i + 1]][, -1])
df_list[[i + 1]] <- list()
}
}
df_list <- df_list[!sapply(df_list, function(x) NROW(x) == 0)]
df_list
[[1]]
ID b c d b c d
1 1 x y z x y z
2 1 y z x y z x
3 1 z x y z x y

[[2]]
ID b c d
1 2 x y z
2 2 y z x
3 2 z x y

Merging a data frame from a list of data frames

We could also use Reduce

Reduce(function(...) merge(..., by = c('month', 'year')), lst)

Using @Jaap's example, if the values are not the same, use all=TRUE option from merge.

Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
# month year oracle microsoft google
#1 1 2004 356.0000 NA NA
#2 2 2004 390.0000 339.0000 NA
#3 3 2004 394.4286 357.7143 390.0000
#4 4 2004 391.8571 347.1429 391.8571
#5 5 2004 NA 333.2857 357.7143
#6 6 2004 NA NA 333.2857

merge multiple data.frames [r]

What about something like this:

l2 <- Reduce(function(x, n) merge(x, l1[[n]], by='nu_pregao', suffixes = c("", n)),
seq(2, length(l1)), init = l1[[1]])
l2
#> nu_pregao pcVar pcVar2 pcVar3
#> 1 2371 7.224848 4.055709 4.011461
#> 2 2372 2.797704 2.944882 3.679907
#> 3 2373 3.947368 3.507937 4.693034

Final touch for names consistency:

names(l2)[match("pcVar", names(l2))] <- "pcVar1"
l2
#> nu_pregao pcVar1 pcVar2 pcVar3
#> 1 2371 7.224848 4.055709 4.011461
#> 2 2372 2.797704 2.944882 3.679907
#> 3 2373 3.947368 3.507937 4.693034

Your data:

l1 <- list(read.table(text = "nu_pregao    pcVar
1 2371 7.224848
45 2372 2.797704
89 2373 3.947368", header = TRUE),

read.table(text = "nu_pregao pcVar
2 2371 4.055709
46 2372 2.944882
90 2373 3.507937", header = TRUE),

read.table(text = "nu_pregao pcVar
3 2371 4.011461
47 2372 3.679907
91 2373 4.693034", header = TRUE))

Merge R data frames with differing lengths

We can put all dataframes into a list and use Reduce and merge from base R:

df <- Reduce(function(...) merge(..., by='ID', all.x=TRUE), list(parent, df1, df2))
df[is.na(df)] <- 0

ID Number1 Number2
1 Ants 6 5
2 Cow 0 7
3 Dog 2 0
4 Hen 7 0
5 Tiger 0 3

Or we can use join_all from plyr:

library(plyr)
join_all(list(parent, df1, df2), by='ID', type='left') %>%
replace(is.na(.), 0)

Or with purrr::reduce:

library(tidyverse)

reduce(list(parent, df1, df2), left_join, by = 'ID') %>%
mutate(across(where(is.numeric), ~ replace_na(.x, 0)))

Data

parent <- structure(list(ID = c("Ants", "Cow", "Dog", "Hen", "Tiger")), 
class = "data.frame", row.names = c(NA, -5L))

df1 <- structure(list(ID = c("Ants", "Dog", "Hen"),
Number1 = c(6L, 2L, 7L)),
class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(ID = c("Ants", "Cow", "Tiger"),
Number2 = c(5L, 7L, 3L)),
class = "data.frame", row.names = c(NA, -3L))


Related Topics



Leave a reply



Submit