Cbind a Dataframe With an Empty Dataframe - Cbind.Fill

cbind a dataframe with an empty dataframe - cbind.fill?

Here's a cbind fill:

cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}

Let's try it:

x<-matrix(1:10,5,2)
y<-matrix(1:16, 4,4)
z<-matrix(1:12, 2,6)

cbind.fill(x,y)
cbind.fill(x,y,z)
cbind.fill(mtcars, mtcars[1:10,])

I think I stole this from somewhere.

EDIT STOLE FROM HERE: LINK

cbind error when merging list containing empty data frames

If there are empty list elements, we can remove those

i1 <- (sapply(b, nrow) > 0) & (sapply(a, nrow) > 0)
Map(cbind, b[i1], a[i1])

Or create the condition within Map itself

out <- Map(function(x, y) if(nrow(x) > 0 & nrow(y) > 0) cbind(x, y), b, a)

assuming that otherwise the corresponding list elements have the same number of rows

If we need to get the originaldataset in case one of them is empty, we can do

Map(function(x, y) if(is.null(x)|NROW(x) == 0) {
y} else if(is.null(y)|NROW(y) == 0) {
x} else cbind(x, y),
b, a)

data

a <- list(head(mtcars), data.frame(col1 = numeric(0)), head(iris))
b <- list(data.frame(col1 = numeric(0)), head(iris), head(mtcars))

cbind dataframe in R with placeholders

Something like this maybe?

Assuming ls()

# [1] "data.frame1" "data.frame2" "data.frame3"

as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))

Based on @akrun's comment, this can be simplified to

as.data.frame(Reduce("cbind", mget(ls())))

cbind a dataframe with an empty dataframe - cbind.fill?

Here's a cbind fill:

cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}

Let's try it:

x<-matrix(1:10,5,2)
y<-matrix(1:16, 4,4)
z<-matrix(1:12, 2,6)

cbind.fill(x,y)
cbind.fill(x,y,z)
cbind.fill(mtcars, mtcars[1:10,])

I think I stole this from somewhere.

EDIT STOLE FROM HERE: LINK

Replacement of plyr::cbind.fill in dplyr?

Here's a way with some purrr and dplyr functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames, but with more columns you could use dplyr::rename. Do a full-join across the whole list based on the original row names, and fill NAs with 0.

library(dplyr)
library(purrr)

l1 %>%
imap(~setNames(.x, .y)) %>%
map(tibble::rownames_to_column) %>%
reduce(full_join, by = "rowname") %>%
mutate_all(tidyr::replace_na, 0)
#> rowname df1 df2 df3 df4
#> 1 A 1 0 9 4
#> 2 B 2 2 0 0
#> 3 C 3 0 3 0
#> 4 D 0 6 6 0
#> 5 E 0 0 0 12

cbind 2 dataframes with different number of rows

I think you should instead use merge:

merge(df1, df2, by="year", all = T)

For your data:

df1 = data.frame(matrix(0, 7, 4))
names(df1) = c("year", "avg", "hr", "sal")
df1$year = 2010:2016
df1$avg = c(.3, .29, .275, .280, .295, .33, .315)
df1$hr = c(31, 30, 14, 24, 18, 26, 40)
df1$sal = c(2000, 4000, 600, 800, 1000, 7000, 9000)
df2 = data.frame(matrix(0, 5, 3))
names(df2) = c("year", "pos", "fld")
df2$year = c(2010, 2011, 2013, 2014, 2015)
df2$pos = c('A', 'B', 'C', 'B', 'D')
df2$fld = c(.99,.995,.97,.98,.99)

cbind is meant to column-bind two dataframes that are in all sense compatible. But what you aim to do is actual merge, where you want the elements from the two data frames not be discarded, and for missing values you get NA instead.

Binding dataframes of different length (no cbind, no merge)

Edit: In case there are multiple df. Do this

  • Create a list of all dfs except one say first one
  • use purrr::reduce to join all these together
  • pass first df in .init argument.
df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df <- data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df3 <- data.frame(bb = 101:110, cc = sample(letters, 10))


reduce(list(df2, df3), .init = df %>% mutate(id = row_number()) , ~full_join(.x, .y %>% mutate(id = row_number()), by = "id" )) %>%
select(-id)

v x m n bb cc
1 1 A 10 I 101 u
2 2 C 9 H 102 v
3 3 D 8 G 103 n
4 4 E 7 F 104 w
5 5 B NA <NA> 105 s
6 NA <NA> NA <NA> 106 y
7 NA <NA> NA <NA> 107 g
8 NA <NA> NA <NA> 108 i
9 NA <NA> NA <NA> 109 p
10 NA <NA> NA <NA> 110 h

Earlier Answer: Create a dummy column id in both dfs and use full_join

full_join(df %>% mutate(id = row_number()), df2 %>% mutate(id = row_number()), by = "id") %>%
select(-id)

v x m n
1 1 A 10 I
2 2 C 9 H
3 3 D 8 G
4 4 E 7 F
5 5 B NA <NA>

Results are different from as expected becuase of different random number seed


Or in BaseR

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)

id v x m n
1 1 1 A 10 I
2 2 2 C 9 H
3 3 3 D 8 G
4 4 4 E 7 F
5 5 5 B NA <NA>

Remove extra column simply by subsetting []

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)[-1]

v x m n
1 1 A 10 I
2 2 C 9 H
3 3 D 8 G
4 4 E 7 F
5 5 B NA <NA>


Related Topics



Leave a reply



Submit