Simultaneously Merge Multiple Data.Frames in a List

Simultaneously merge multiple data.frames in a list

Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below:

x <- data.frame(i = c("a","b","c"), j = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, stringsAsFactors=FALSE)

Update June 2018: I divided the answer in three sections representing three different ways to perform the merge. You probably want to use the purrr way if you are already using the tidyverse packages. For comparison purposes below, you'll find a base R version using the same sample dataset.

1) Join them with reduce from the purrr package:

The purrr package provides a reduce function which has a concise syntax:

library(tidyverse)
list(x, y, z) %>% reduce(left_join, by = "i")
#  A tibble: 3 x 4
#  i       j     k     l
#  <chr> <int> <int> <int>
# 1 a      1    NA     9
# 2 b      2     4    NA
# 3 c      3     5     7

You can also perform other joins, such as a full_join or inner_join:

list(x, y, z) %>% reduce(full_join, by = "i")
# A tibble: 4 x 4
# i       j     k     l
# <chr> <int> <int> <int>
# 1 a     1     NA     9
# 2 b     2     4      NA
# 3 c     3     5      7
# 4 d     NA    6      8

list(x, y, z) %>% reduce(inner_join, by = "i")
# A tibble: 1 x 4
# i       j     k     l
# <chr> <int> <int> <int>
# 1 c     3     5     7

2) dplyr::left_join() with base R Reduce():

list(x,y,z) %>%
    Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="i"), .)

#   i j  k  l
# 1 a 1 NA  9
# 2 b 2  4 NA
# 3 c 3  5  7

3) Base R merge() with base R Reduce():

And for comparison purposes, here is a base R version of the left join based on Charles's answer.

 Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
        list(x,y,z))
#   i j  k  l
# 1 a 1 NA  9
# 2 b 2  4 NA
# 3 c 3  5  7

R - Merge list of three dataframes into single dataframe with ID in first column, next three columns show values

What about something like this using dplyr/purrr:

require(tidyverse);
reduce(lst, full_join, by = "ID");
#   ID Value.x Value.y Value
# 1  A       1       1    NA
# 2  B       1      NA     1
# 3  C       1      NA     1
# 4  D      NA       1    NA
# 5  E      NA       1    NA

Or with the NAs replaced with 0s:

reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
#  ID Value.x Value.y Value
#1  A       1       1     0
#2  B       1       0     1
#3  C       1       0     1
#4  D       0       1     0
#5  E       0       1     0

Sample data

options(stringsAsFactors = FALSE);
lst <- list(
    data.frame(ID = c("A", "B", "C"), Value = c(1, 1, 1)),
    data.frame(ID = c("A", "D", "E"), Value = c(1, 1, 1)),
    data.frame(ID = c("B", "C"), Value = c(1, 1)))

How to join multiple data.frames with different numbers of rows at once in R?

If inner_join does the trick for a pair of dataframes, we can use Reduce do apply inner_join to all data.frames.
I would first put all the data.frames in a list (here with mget(ls())), then call Reduce with inner_join as the function to be reduced

library(dplyr)

Reduce(inner_join, mget(ls(pattern='df\\d+')))

I prefer using tidyverse for everything, so I usually use purrr::reduce here:

library(dplyr)
library(purrr)

reduce(mget(ls(pattern='df\\d+')), inner_join)

As an alternative to name matchin with ls(pattern=)), we can use a function to select all dataframes from your global environment. We can use Filter() or purrr::keep:

library(purrr)
library(dplyr)
mget(ls()) %>% keep(is.data.frame) %>% reduce(inner_join)

How to merge multiple data.frames with Reduce and get an ordered output?

This happens due to sort of merge:

sort - logical. Should the result be sorted on the by columns?

So, instead you may use

Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
#   a b c  d  e
# 1 5 2 4 10  1
# 2 6 7 4  6  1
# 3 7 3 5  5 NA
# 4 5 2 6  5 NA
# 5 4 4 2  8 NA

Loop for merging multiple dataframes from list of dataframes in R

I'm not a fan of how this ends up with multiple columns with the same name, but that's what you wanted.

You aren't really asking for a merge because that would give 3 x 3 = 9 rows, so I used cbind.

(I changed the name of the list of data.frames to df_list to avoid confusion)

df_list <- list(
  data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
  data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
  data.frame(ID = 2, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y'))
  )

for (i in 1:(length(df_list) - 1)) {
  if (NROW(df_list[[i]]) == NROW(df_list[[i + 1]]) &&
      all(df_list[[i]]$ID == df_list[[i + 1]]$ID)) {
    df_list[[i]] <- cbind(df_list[[i]], df_list[[i + 1]][, -1])
    df_list[[i + 1]] <- list()
  }
}
df_list <- df_list[!sapply(df_list, function(x) NROW(x) == 0)]

df_list
[[1]]
  ID b c d b c d
1  1 x y z x y z
2  1 y z x y z x
3  1 z x y z x y

[[2]]
  ID b c d
1  2 x y z
2  2 y z x
3  2 z x y

Merging a data frame from a list of data frames

We could also use Reduce

Reduce(function(...) merge(..., by = c('month', 'year')), lst)

Using @Jaap's example, if the values are not the same, use all=TRUE option from merge.

Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
#     month year   oracle microsoft   google
#1     1 2004 356.0000        NA       NA
#2     2 2004 390.0000  339.0000       NA
#3     3 2004 394.4286  357.7143 390.0000
#4     4 2004 391.8571  347.1429 391.8571
#5     5 2004       NA  333.2857 357.7143
#6     6 2004       NA        NA 333.2857

merge multiple data.frames [r]

What about something like this:

l2 <- Reduce(function(x, n) merge(x, l1[[n]], by='nu_pregao', suffixes = c("", n)),
             seq(2, length(l1)), init = l1[[1]])
l2
#>   nu_pregao    pcVar   pcVar2   pcVar3
#> 1      2371 7.224848 4.055709 4.011461
#> 2      2372 2.797704 2.944882 3.679907
#> 3      2373 3.947368 3.507937 4.693034

Final touch for names consistency:

names(l2)[match("pcVar", names(l2))] <- "pcVar1"
l2
#>   nu_pregao   pcVar1   pcVar2   pcVar3
#> 1      2371 7.224848 4.055709 4.011461
#> 2      2372 2.797704 2.944882 3.679907
#> 3      2373 3.947368 3.507937 4.693034

Your data:

l1 <- list(read.table(text = "nu_pregao    pcVar
1       2371 7.224848
45      2372 2.797704
89      2373 3.947368", header = TRUE),

read.table(text = "nu_pregao    pcVar
2       2371 4.055709
46      2372 2.944882
90      2373 3.507937", header = TRUE),

read.table(text = "nu_pregao    pcVar
3       2371 4.011461
47      2372 3.679907
91      2373 4.693034", header = TRUE))

Merge R data frames with differing lengths

We can put all dataframes into a list and use Reduce and merge from base R:

df <- Reduce(function(...) merge(..., by='ID', all.x=TRUE), list(parent, df1, df2))
df[is.na(df)] <- 0 

     ID Number1 Number2
1  Ants       6       5
2   Cow       0       7
3   Dog       2       0
4   Hen       7       0
5 Tiger       0       3

Or we can use join_all from plyr:

library(plyr)
join_all(list(parent, df1, df2), by='ID', type='left') %>% 
  replace(is.na(.), 0)

Or with purrr::reduce:

library(tidyverse)

reduce(list(parent, df1, df2), left_join, by = 'ID') %>% 
  mutate(across(where(is.numeric), ~ replace_na(.x, 0)))

Data

parent <- structure(list(ID = c("Ants", "Cow", "Dog", "Hen", "Tiger")), 
                    class = "data.frame", row.names = c(NA, -5L))

df1 <- structure(list(ID = c("Ants", "Dog", "Hen"), 
                      Number1 = c(6L, 2L, 7L)), 
                 class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(ID = c("Ants", "Cow", "Tiger"), 
                      Number2 = c(5L, 7L, 3L)), 
                 class = "data.frame", row.names = c(NA, -3L))

Simultaneously Merge Multiple Data.Frames in a List