Merging more than 2 dataframes in R by rownames
Three lines of code will give you the exact same result:
dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)
Ah I see, now I understand why you are getting into so much pain. Using the old for
loop surely does the trick. Maybe there are even more clever solutions
rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}
Merge Multiple Data Frames by Row Names
Merging by row.names
does weird things - it creates a column called Row.names, which makes subsequent merges hard.
To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table
instead):
Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))
Merge list of uneven dataframes by rownames
We create a row names column and then do the join. We loop through the list
with map
, create a row names column with rownames_to_column
and reduce
to a single dataset by doing a full_join
by
the row names and rename
the column names if needed
library(tidyverse)
l %>%
map( ~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
rename_at(2:6, ~ names(l))
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1
Or another option is to bind_rows
and then spread
l %>%
map(rownames_to_column, 'rn') %>%
bind_rows(.id = 'grp') %>%
spread(grp, answer)
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1
Merge many R data frames by row.names with differing lengths
We could get all the datasets into a list
and use merge
with Reduce
specifying the by
as a new column created from the row names
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), \(x)
transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE),
lst1)
row.names(out) <- out[[1]]
out <- out[-1]
-output
out
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
Or using tidyverse
with full_join
after creating a row names column with rownames_to_column
(from tibble
)
library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\\d+$')) %>%
map(~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames("rn")
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
Merge Dataframes with different number of rows
Your dataset is,
dat1 = data.frame("Arable and Horticulture" = c(100, 90,23, 3, 56, 299),
row.names = c("Acer", "Achillea", "Aesculus", "Alliaria", "Allium", "Anchusa"))
dat2 = data.frame("Improved Grassland" = c(12, 3, 50, 23, 299, 29),
row.names = c("Acer", "Achillea", "Allium", "Brassica", "Calystegia", "Campanula"))
As @Vinícius Félix suggested first convert rownames to column.
library(tibble)
dat1 = rownames_to_column(dat1, "Plants")
dat2 = rownames_to_column(dat2, "Plants")
Then lets join both the datasets,
library(dplyr)
dat = full_join(dat1, dat2, )
And replace the NA
with 0
dat = dat %>% replace(is.na(.), 0)
Plants Arable.and.Horticulture Improved.Grassland
1 Acer 100 12
2 Achillea 90 3
3 Aesculus 23 0
4 Alliaria 3 0
5 Allium 56 50
6 Anchusa 299 0
7 Brassica 0 23
8 Calystegia 0 299
9 Campanula 0 29
merge 2 dataframes in r with same row names
With merge
, we can use the by
as row.names
out <- merge(df1, df2, by = 'row.names')
If we need to plot, either we can use base R
barplot
barplot(`row.names<-`(as.matrix(out[-1]),
out$Row.names), col = c('blue', 'green', 'red'), legend = TRUE)
Or with tidyverse
library(ggplot2)
library(dplyr)
library(tidyr)
merge(df1, df2, by = 'row.names') %>%
rename(nm = 'Row.names') %>% # // rename the column name
type.convert(as.is = TRUE) %>% # // some columns were not of the correct type
pivot_longer(cols = -nm) %>% # // reshape to 'long' format
ggplot(aes(x = name, y = value, fill = nm)) + # // plot as bar
geom_col() +
theme_bw()
-output
Merge or combine by rownames
Use match
to return your desired vector, then cbind
it to your matrix
cbind(t, z[, "symbol"][match(rownames(t), rownames(z))])
[,1] [,2] [,3] [,4]
GO.ID "GO:0002009" "GO:0030334" "GO:0015674" NA
LEVEL "8" "6" "7" NA
Annotated "342" "343" "350" NA
Significant "1" "1" "1" NA
Expected "0.07" "0.07" "0.07" NA
resultFisher "0.679" "0.065" "0.065" NA
ILMN_1652464 "0" "0" "1" "PLAC8"
ILMN_1651838 "0" "0" "0" "RND1"
ILMN_1711311 "1" "1" "0" NA
ILMN_1653026 "0" "0" "0" "GRA"
PS. Be warned that t
is base R function that is used to transpose matrices. By creating a variable called t, it can lead to confusion in your downstream code.
rbind data frames in R, possible to add more than a number to duplicated rownames?
Instead of storing important information in rownames you can have them in separate column. Use make.unique
to have a unique name.
library(dplyr)
library(tibble)
res <- df1 %>%
rownames_to_column() %>%
bind_rows(df2 %>% rownames_to_column()) %>%
mutate(rowname = make.unique(rowname, sep = '_'))
res
# rowname A B
#1 A1 1 1
#2 B1 2 2
#3 C1 3 3
#4 C1_1 1 1
#5 C2 2 2
#6 C3 3 3
If you need the values back as rownames use column_to_rownames
.
res %>% column_to_rownames()
# A B
#A1 1 1
#B1 2 2
#C1 3 3
#C1_1 1 1
#C2 2 2
#C3 3 3
Combine several data.frames into one (keep every rownames)
If you have all your data frames in an environment, you can get them into a named list then use package reshape2 to reshape the list. If desired, you can then set the first column as the row names.
library(reshape2)
dcast(melt(Filter(is.data.frame, mget(ls()))), L1 ~ Name)
# L1 ABIDING ABLE ABROAD ACHIEVE ACROSS ACT ACTION ACTIVISM ACTS ADDICTION ADVANCE ADVANCING ADVANTAGE
# 1 Apple 1 1 1 NA 4 2 NA NA NA NA NA NA 1
# 2 Berry NA NA NA NA NA 2 2 1 1 1 1 NA NA
# 3 Orange NA NA NA 1 3 1 1 NA NA NA 1 1 NA
Note: This assumes all your data is in the global environment and that no other data frames are present except the ones to be used here.
Related Topics
Dplyr Replacing Na Values in a Column Based on Multiple Conditions
How to Perform Pairwise Operation Like '%In%' and Set Operations for a List of Vectors
Combining New Lines and Italics in Facet Labels with Ggplot2
Combining Duplicated Rows in R and Adding New Column Containing Ids of Duplicates
Mutate Multiple Variable to Create Multiple New Variables
Find All Date Ranges for Overlapping Start and End Dates in R
Flip Ordering of Legend Without Altering Ordering in Plot
Dt[!(X == .)] and Dt[X != .] Treat Na in X Inconsistently
Submit Form with No Submit Button in Rvest
Selecting a Subset of Columns in a Data.Table
How to Use the 'Sweep' Function