Column Binding in R

Column binding in R

To illustrate the points from my comment:

> d1 <- data.frame(a = 1:5,b = 1:5)
> d2 <- data.frame(a = letters[1:5],b = letters[1:5])
> cbind(d1,d2)
a b a b
1 1 1 a a
2 2 2 b b
3 3 3 c c
4 4 4 d d
5 5 5 e e

> data.frame(cbind(d1,d2))
a b a.1 b.1
1 1 1 a a
2 2 2 b b
3 3 3 c c
4 4 4 d d
5 5 5 e e

> x <- data.frame(cbind(d1,d2))
> sort(colnames(x))
[1] "a" "a.1" "b" "b.1"
> x[,order(colnames(x))]
a a.1 b b.1
1 1 a 1 a
2 2 b 2 b
3 3 c 3 c
4 4 d 4 d
5 5 e 5 e

column bind in R and name the column

You can specify the new column name in the call to cbind:

mydf <- cbind(mydf, newcolumn=mydf[,"c"])
mydf
# a b c newcolumn
# [1,] 1 2 6 6
# [2,] 1 3 4 4

Data (constructed with the same approach):

mydf <- cbind(a=c(1, 1), b=c(2, 3), c=c(6, 4))

If you had a data frame instead of a matrix, you could simply do mydf$newcolumn <- mydf$c.

Column bind several list elements based on id variable

reduce(my_list, full_join, by='id')
id x y
1 1 1 6
2 2 2 5
3 3 3 4

If its only 2 dataframes:

invoke(full_join, my_list, by='id')
id x y
1 1 1 6
2 2 2 5
3 3 3 4

If you are using base R, any of the following should work:

Reduce(merge, my_list)
do.call(merge, my_list)

How to column bind 2 tables but remove the same column

We can use setdiff to get the columns in one that is not found in another

nm1 <- setdiff(names(dt2), names(dt1))
out <- cbind(dt1, dt2[nm1])

If we have multiple datasets, place it in a list get the intersecting column names (names that are common in all), get the setdiff of those from the column names of each individual datasets in the list and cbind

lst1 <- list(dt1, dt2, dt3)
nm2 <- Reduce(intersect, lapply(lst1, names))
cbind(lst1[[1]], do.call(cbind,
lapply(lst1[-1], function(dat) dat[setdiff(names(dat), nm2)] )))
# a b c d e f g
#1 1 11 21 31 41 51 61
#2 2 12 22 32 42 52 62
#3 3 13 23 33 43 53 63
#4 4 14 24 34 44 54 64
#5 5 15 25 35 45 55 65
#6 6 16 26 36 46 56 66
#7 7 17 27 37 47 57 67
#8 8 18 28 38 48 58 68
#9 9 19 29 39 49 59 69
#10 10 20 30 40 50 60 70

Or using a for loop

out1 <- dt1 # initialize with the first data
for(i in 2:length(lst1)) {

out1 <- cbind(out1, lst1[[i]][setdiff(names(lst1[[i]]), nm2)])
}
out1

data

dt1 <- as.data.frame(matrix(1:50, 10, 5, dimnames = list(NULL, letters[1:5])))
dt2 <- as.data.frame(matrix(11:60, 10, 5, dimnames = list(NULL, letters[c(1:4, 6)])))
dt3 <- as.data.frame(matrix(21:70, 10, 5,
dimnames = list(NULL, letters[c(1:4, 7)])))

Binding dataframes with different column names by row

You can bind_rows and then select non-NA value using coalesce :

library(dplyr)

bind_rows(my_ls) %>% mutate(C = coalesce(C, D)) %>% select(A:C)

# A B C
# <dbl> <chr> <lgl>
#1 1 X TRUE
#2 2 Y FALSE
#3 3 Z FALSE
#4 3 U TRUE
#5 4 V TRUE
#6 5 W FALSE

Binding rows of multiple data frames containing columns of class interval from lubridate package

There's a hint that when you do do.call(rbind, test) with dplyr loaded and get the warning:

Warning messages:
1: In bind_rows_(x, .id) :
Vectorizing 'Interval' elements may not preserve their attributes

That dplyr::bind_rows() is actually being called and not base::rbind() and the interval attributes are dropped. This seems to occur when the objects are tibbles (tbl or tbl_df class).

You can avoid this by using rbind.data.frame() instead:

do.call(rbind.data.frame, test)
# A tibble: 2 x 2
# Groups: participant_code [1]
participant_code interval_1
* <chr> <Interval>
1 BGC119AP01 2016-11-18 UTC--2017-12-18 UTC
2 BGC119AP02 2016-11-18 UTC--2017-12-18 UTC

Binding two data frames with different names of columns and rows

For this I am:

  1. Looping through each unique state name in your dataset
  2. Creating a new workbook each time
  3. Adding a worksheet to that workbook that is the name of the state
  4. Filtering df_1 down to just the state of interest's data
  5. Adding the state's data to the worksheet
  6. Then writing df_2 to each -- I just arbitrarily chose column J (i.e., the 10th column) but you can write that anywhere you want.

You can also specify if columns should be certain width with the style commands in openxlsx.

library(openxlsx)
library(tidyverse) #probably don't need this whole library but I usually just have this loaded
for(i in unique(states)){

wb<- createWorkbook()
addWorksheet(wb, sheetName = i)
df_i = df_1 %>%
filter(states == i)
writeData(wb, sheet = i, x = df_i, startCol = 5)
writeData(wb, sheet = i, df_2, startCol = 1)
saveWorkbook(wb, paste0('YOURPATHWAY',i,".xlsx"), overwrite = TRUE)

}

Automatically coerce all column types of one data frame to the type of another prior to binding

We could use type.convert()

Explanation: after comment of OP:

type_convert does not consider ds_a (you can check if you compare glimpse(ds_a) with glimpse of the resulting dataframe:

Note the columns of ds_a have the same classes as in result.

> # compare classes
> glimpse(ds_a)
Rows: 6
Columns: 4
$ x <int> 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4"
$ l <dbl> 2, 2, 2, 2, 2, 2
> glimpse(ds_b)
Rows: 6
Columns: 3
$ x <fct> 1, 2, 3, 4, 5, 6
$ y <chr> "5", "5", "5", "5", "5", "5"
$ p <dbl> 2, 2, 2, 2, 2, 2
> glimpse(result)
Rows: 12
Columns: 5
$ x <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4", NA, NA, NA, NA, NA, NA
$ l <dbl> 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA
$ p <int> NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2

What type.convert does is:

  1. to apply the best fitting class to the data of ds_b (notice the %>% is within bind_rows). So all of ds_b$x are integers therefore R converts class factor to class integer in ds_b$x.
  2. All of ds_b$y are character class but integers in nature, therefore R converts character class to integer class. This may cause the misleading understanding. But, now we have ds_a$y double class and ds_b$y integer class -> but this is no problem for R and bind_rows here double class overrides integer.
> # showing what type.convert does to ds_b
> ds_b$x <- as.integer(ds_b$x)
> ds_b$y <- as.integer(ds_b$y)
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> bind_rows(ds_a, ds_b) %>%
+ as_tibble()
# A tibble: 12 x 5
x y z l p
<int> <dbl> <chr> <dbl> <dbl>
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 NA NA 2
8 2 5 NA NA 2
9 3 5 NA NA 2
10 4 5 NA NA 2
11 5 5 NA NA 2
12 6 5 NA NA 2

  1. converts ds_b$p which is class double to class integer because the data are integer in nature.

Solution:

library(dplyr)
bind_rows(ds_a, ds_b %>% type.convert(as.is=TRUE))

output:

   x y    z  l  p
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 <NA> NA 2
8 2 5 <NA> NA 2
9 3 5 <NA> NA 2
10 4 5 <NA> NA 2
11 5 5 <NA> NA 2
12 6 5 <NA> NA 2


Related Topics



Leave a reply



Submit