R rbind data.tables with different column using do.call on vector of names
mget()
:
a = data.frame(a = c(1:5), b = c(6:10))
b = data.frame(a = c(11:15), b = c(16:20), c = LETTERS[1:5])
vn <- c("a","b")
rbindlist(mget(vn), fill = TRUE)
# a b c
# 1: 1 6 <NA>
# 2: 2 7 <NA>
# 3: 3 8 <NA>
# 4: 4 9 <NA>
# 5: 5 10 <NA>
# 6: 11 16 A
# 7: 12 17 B
# 8: 13 18 C
# 9: 14 19 D
# 10: 15 20 E
function to rbind list of dataframes different columns and rows
This can be achieved with a for loop (I think it could be achieved with mapply
to, check ?mapply
). The overall strategy is filling each df in the list with NAs (cbind
ing them) and then rbindlist
ing the resulting list:
library(data.table)
cols <- max(sapply(df, ncol))
# This is the length of the NA vectors that make the cbinding dfs:
lengths <- (cols - sapply(df, ncol))*sapply(df, nrow)
newdf <- list()
for (i in 1:length(df)){
if (ncol(df[[i]]) != cols){
newdf[[i]] <- cbind(df[[i]],
as.data.frame(matrix(rep(NA, lengths[i]),
ncol = lengths[i] / nrow(df[[i]]))))
} else {
newdf[[i]] <- df[[i]]
}
}
rbindlist(newdf, use.names = FALSE)
Which results in:
d e V1 V2
1: 4 c <NA> NA
2: 5 d <NA> NA
3: 1 a one NA
4: 2 b two NA
5: 3 c three NA
6: 6 e one 100
7: 7 f two 101
8: 8 g three 102
rbind a list of data frames with different columns
You can use data.table
:
library(data.table)
rbindlist(myList, fill = TRUE)
# x1 x3 x4 x2
#1: 1 2 7 NA
#2: 3 3 8 4
#3: 9 2 9 5
rbindlist data.tables different dimensions
Here's an option:
do.call(function(...) Map(function(...) rbind(..., idcol = T), ...), result)
#$DT1
# .id a b
#1: 1 1 a
#2: 1 2 b
#3: 1 3 c
#4: 2 1 a
#5: 2 2 b
#6: 2 3 c
#
#$DT2
# .id d e
#1: 1 4 d
#2: 1 5 e
#3: 1 6 f
#4: 2 4 d
#5: 2 5 e
#6: 2 6 f
Here's another:
lapply(purrr::transpose(result), rbindlist, idcol = T)
Efficient way to rbind data.frames with different columns
UPDATE: See this updated answer instead.
UPDATE (eddi): This has now been implemented in version 1.8.11 as a fill
argument to rbind
. For example:
DT1 = data.table(a = 1:2, b = 1:2)
DT2 = data.table(a = 3:4, c = 1:2)
rbind(DT1, DT2, fill = TRUE)
# a b c
#1: 1 1 NA
#2: 2 2 NA
#3: 3 NA 1
#4: 4 NA 2
FR #4790 added now - rbind.fill (from plyr) like functionality to merge list of data.frames/data.tables
Note 1:
This solution uses data.table
's rbindlist
function to "rbind" list of data.tables and for this, be sure to use version 1.8.9 because of this bug in versions < 1.8.9.
Note 2:
rbindlist
when binding lists of data.frames/data.tables, as of now, will retain the data type of the first column. That is, if a column in first data.frame is character and the same column in the 2nd data.frame is "factor", then, rbindlist
will result in this column being a character. So, if your data.frame consisted of all character columns, then, your solution with this method will be identical to the plyr method. If not, the values will still be the same, but some columns will be character instead of factor. You'll have to convert to "factor" yourself after. Hopefully this behaviour will change in the future.
And now here's using data.table
(and benchmarking comparison with rbind.fill
from plyr
):
require(data.table)
rbind.fill.DT <- function(ll) {
# changed sapply to lapply to return a list always
all.names <- lapply(ll, names)
unq.names <- unique(unlist(all.names))
ll.m <- rbindlist(lapply(seq_along(ll), function(x) {
tt <- ll[[x]]
setattr(tt, 'class', c('data.table', 'data.frame'))
data.table:::settruelength(tt, 0L)
invisible(alloc.col(tt))
tt[, c(unq.names[!unq.names %chin% all.names[[x]]]) := NA_character_]
setcolorder(tt, unq.names)
}))
}
rbind.fill.PLYR <- function(ll) {
rbind.fill(ll)
}
require(microbenchmark)
microbenchmark(t1 <- rbind.fill.DT(ll), t2 <- rbind.fill.PLYR(ll), times=10)
# Unit: seconds
# expr min lq median uq max neval
# t1 <- rbind.fill.DT(ll) 10.8943 11.02312 11.26374 11.34757 11.51488 10
# t2 <- rbind.fill.PLYR(ll) 121.9868 134.52107 136.41375 184.18071 347.74724 10
# for comparison change t2 to data.table
setattr(t2, 'class', c('data.table', 'data.frame'))
data.table:::settruelength(t2, 0L)
invisible(alloc.col(t2))
setcolorder(t2, unique(unlist(sapply(ll, names))))
identical(t1, t2) # [1] TRUE
It should be noted that plyr
's rbind.fill
edges past this particular data.table
solution until list size of about 500.
Benchmarking plot:
Here's the plot on runs with list length of data.frames with seq(1000, 10000, by=1000)
. I've used microbenchmark
with 10 reps on each of these different list lengths.
Benchmarking gist:
Here's the gist for benchmarking, in case anyone wants to replicate the results.
How to rbind the data with different number of column in R?
You could first cbind
the rownames
in dtf2
and rbind
it with dtf1
.
rbind(dtf1, cbind(aa = rownames(dtf2), dtf2))
# aa bb cc dd ee ff gg hh ii
#a 1 2 3.0 4 5 6 6 7 7
#b 11 3 6.5 7 8 9 3 2 5
#c c 1 2.0 3 4 5 6 7 8
#d d 10 9.0 8 7 6 5 4 3
Using dplyr
, it doesn't directly coerce the column to character so you need to explicitly mention it.
library(dplyr)
bind_rows(dtf1 %>% mutate(aa = as.character(aa)),
bind_cols(aa = rownames(dtf2), dtf2))
Combine two data frames by rows (rbind) when they have different sets of columns
rbind.fill
from the package plyr
might be what you are looking for.
Related Topics
Remove Columns of Dataframe Based on Conditions in R
Cast String Directly to Idatetime
Shiny Rcharts Multiple Chart Output
Extract Hyperlink from Excel File in R
Sum Specific Columns Among Rows
Date-Time Differences Between Rows in R
How to Preserve Continuous (1,2,3,...N) Ranking Notation When Ranking in R
Create Several Dummy Variables from One String Variable
Blend of Na.Omit and Na.Pass Using Aggregate
Using Variable Value as Column Name in Data.Frame or Cbind
R Group By, Counting Non-Na Values
Text Mining R Package & Regex to Handle Replace Smart Curly Quotes
Regex; Eliminate All Punctuation Except