Column binding in R
To illustrate the points from my comment:
> d1 <- data.frame(a = 1:5,b = 1:5)
> d2 <- data.frame(a = letters[1:5],b = letters[1:5])
> cbind(d1,d2)
a b a b
1 1 1 a a
2 2 2 b b
3 3 3 c c
4 4 4 d d
5 5 5 e e
> data.frame(cbind(d1,d2))
a b a.1 b.1
1 1 1 a a
2 2 2 b b
3 3 3 c c
4 4 4 d d
5 5 5 e e
> x <- data.frame(cbind(d1,d2))
> sort(colnames(x))
[1] "a" "a.1" "b" "b.1"
> x[,order(colnames(x))]
a a.1 b b.1
1 1 a 1 a
2 2 b 2 b
3 3 c 3 c
4 4 d 4 d
5 5 e 5 e
column bind in R and name the column
You can specify the new column name in the call to cbind
:
mydf <- cbind(mydf, newcolumn=mydf[,"c"])
mydf
# a b c newcolumn
# [1,] 1 2 6 6
# [2,] 1 3 4 4
Data (constructed with the same approach):
mydf <- cbind(a=c(1, 1), b=c(2, 3), c=c(6, 4))
If you had a data frame instead of a matrix, you could simply do mydf$newcolumn <- mydf$c
.
Column bind several list elements based on id variable
reduce(my_list, full_join, by='id')
id x y
1 1 1 6
2 2 2 5
3 3 3 4
If its only 2 dataframes:
invoke(full_join, my_list, by='id')
id x y
1 1 1 6
2 2 2 5
3 3 3 4
If you are using base R, any of the following should work:
Reduce(merge, my_list)
do.call(merge, my_list)
How to column bind 2 tables but remove the same column
We can use setdiff
to get the columns in one that is not found in another
nm1 <- setdiff(names(dt2), names(dt1))
out <- cbind(dt1, dt2[nm1])
If we have multiple datasets, place it in a list
get the intersect
ing column names (names that are common in all), get the setdiff
of those from the column names of each individual datasets in the list
and cbind
lst1 <- list(dt1, dt2, dt3)
nm2 <- Reduce(intersect, lapply(lst1, names))
cbind(lst1[[1]], do.call(cbind,
lapply(lst1[-1], function(dat) dat[setdiff(names(dat), nm2)] )))
# a b c d e f g
#1 1 11 21 31 41 51 61
#2 2 12 22 32 42 52 62
#3 3 13 23 33 43 53 63
#4 4 14 24 34 44 54 64
#5 5 15 25 35 45 55 65
#6 6 16 26 36 46 56 66
#7 7 17 27 37 47 57 67
#8 8 18 28 38 48 58 68
#9 9 19 29 39 49 59 69
#10 10 20 30 40 50 60 70
Or using a for
loop
out1 <- dt1 # initialize with the first data
for(i in 2:length(lst1)) {
out1 <- cbind(out1, lst1[[i]][setdiff(names(lst1[[i]]), nm2)])
}
out1
data
dt1 <- as.data.frame(matrix(1:50, 10, 5, dimnames = list(NULL, letters[1:5])))
dt2 <- as.data.frame(matrix(11:60, 10, 5, dimnames = list(NULL, letters[c(1:4, 6)])))
dt3 <- as.data.frame(matrix(21:70, 10, 5,
dimnames = list(NULL, letters[c(1:4, 7)])))
Binding dataframes with different column names by row
You can bind_rows
and then select non-NA value using coalesce
:
library(dplyr)
bind_rows(my_ls) %>% mutate(C = coalesce(C, D)) %>% select(A:C)
# A B C
# <dbl> <chr> <lgl>
#1 1 X TRUE
#2 2 Y FALSE
#3 3 Z FALSE
#4 3 U TRUE
#5 4 V TRUE
#6 5 W FALSE
Binding rows of multiple data frames containing columns of class interval from lubridate package
There's a hint that when you do do.call(rbind, test)
with dplyr
loaded and get the warning:
Warning messages:
1: In bind_rows_(x, .id) :
Vectorizing 'Interval' elements may not preserve their attributes
That dplyr::bind_rows()
is actually being called and not base::rbind()
and the interval attributes are dropped. This seems to occur when the objects are tibbles (tbl
or tbl_df
class).
You can avoid this by using rbind.data.frame()
instead:
do.call(rbind.data.frame, test)
# A tibble: 2 x 2
# Groups: participant_code [1]
participant_code interval_1
* <chr> <Interval>
1 BGC119AP01 2016-11-18 UTC--2017-12-18 UTC
2 BGC119AP02 2016-11-18 UTC--2017-12-18 UTC
Binding two data frames with different names of columns and rows
For this I am:
- Looping through each unique state name in your dataset
- Creating a new workbook each time
- Adding a worksheet to that workbook that is the name of the state
- Filtering df_1 down to just the state of interest's data
- Adding the state's data to the worksheet
- Then writing df_2 to each -- I just arbitrarily chose column J (i.e., the 10th column) but you can write that anywhere you want.
You can also specify if columns should be certain width with the style commands in openxlsx.
library(openxlsx)
library(tidyverse) #probably don't need this whole library but I usually just have this loaded
for(i in unique(states)){
wb<- createWorkbook()
addWorksheet(wb, sheetName = i)
df_i = df_1 %>%
filter(states == i)
writeData(wb, sheet = i, x = df_i, startCol = 5)
writeData(wb, sheet = i, df_2, startCol = 1)
saveWorkbook(wb, paste0('YOURPATHWAY',i,".xlsx"), overwrite = TRUE)
}
Automatically coerce all column types of one data frame to the type of another prior to binding
We could use type.convert()
Explanation: after comment of OP:
type_convert
does not consider ds_a
(you can check if you compare glimpse(ds_a)
with glimpse
of the resulting dataframe:
Note the columns of ds_a
have the same classes as in result
.
> # compare classes
> glimpse(ds_a)
Rows: 6
Columns: 4
$ x <int> 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4"
$ l <dbl> 2, 2, 2, 2, 2, 2
> glimpse(ds_b)
Rows: 6
Columns: 3
$ x <fct> 1, 2, 3, 4, 5, 6
$ y <chr> "5", "5", "5", "5", "5", "5"
$ p <dbl> 2, 2, 2, 2, 2, 2
> glimpse(result)
Rows: 12
Columns: 5
$ x <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4", NA, NA, NA, NA, NA, NA
$ l <dbl> 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA
$ p <int> NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2
What type.convert
does is:
- to apply the best fitting class to the data of
ds_b
(notice the %>% is withinbind_rows
). So all ofds_b$x
are integers therefore R converts class factor to class integer in ds_b$x. - All of
ds_b$y
are character class but integers in nature, therefore R converts character class to integer class. This may cause the misleading understanding. But, now we haveds_a$y
double class andds_b$y
integer class -> but this is no problem for R andbind_rows
here double class overrides integer.
> # showing what type.convert does to ds_b
> ds_b$x <- as.integer(ds_b$x)
> ds_b$y <- as.integer(ds_b$y)
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> bind_rows(ds_a, ds_b) %>%
+ as_tibble()
# A tibble: 12 x 5
x y z l p
<int> <dbl> <chr> <dbl> <dbl>
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 NA NA 2
8 2 5 NA NA 2
9 3 5 NA NA 2
10 4 5 NA NA 2
11 5 5 NA NA 2
12 6 5 NA NA 2
- converts
ds_b$p
which is class double to class integer because the data are integer in nature.
Solution:
library(dplyr)
bind_rows(ds_a, ds_b %>% type.convert(as.is=TRUE))
output:
x y z l p
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 <NA> NA 2
8 2 5 <NA> NA 2
9 3 5 <NA> NA 2
10 4 5 <NA> NA 2
11 5 5 <NA> NA 2
12 6 5 <NA> NA 2
Related Topics
Solve Homogenous System Ax = 0 for Any M * N Matrix a in R (Find Null Space Basis for A)
How to Highlight Area Between Two Lines? Ggplot
Place Text Values to Right of Sankey Diagram
Ggplot2: How to Rotate a Graph in a Specific Angle
R: How to Create Grid-Graphics
Use 'J' to Select the Join Column of 'X' and All Its Non-Join Columns
Levenshtein Type Algorithm with Numeric Vectors
Reshape Data from Wide to Long
Shiny Leaflet Easyprint Plugin
Create a Variable That Identifies the Original Data.Frame After Rbind Command in R
Code Folding for Individual Chunks in R Markdown
Replace All Values Lower Than Threshold in R
R: Reading a Binary File That Is Zipped
Function for Polynomials of Arbitrary Order (Symbolic Method Preferred)
Shiny Ui.R - Error in Tag("Div", List(...)) - Not Sure Where Error Is
Classic Case of 'Sum' Returning Na Because It Doesn't Sum Nas