Merging More Than 2 Dataframes in R by Rownames

Merging more than 2 dataframes in R by rownames

Three lines of code will give you the exact same result:

dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)

Ah I see, now I understand why you are getting into so much pain. Using the old for loop surely does the trick. Maybe there are even more clever solutions

rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}

Merge Multiple Data Frames by Row Names

Merging by row.names does weird things - it creates a column called Row.names, which makes subsequent merges hard.

To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table instead):

Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))

Merge list of uneven dataframes by rownames

We create a row names column and then do the join. We loop through the list with map, create a row names column with rownames_to_column and reduce to a single dataset by doing a full_join by the row names and rename the column names if needed

library(tidyverse)
l %>%
map( ~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
rename_at(2:6, ~ names(l))
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1

Or another option is to bind_rows and then spread

l %>%
map(rownames_to_column, 'rn') %>%
bind_rows(.id = 'grp') %>%
spread(grp, answer)
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1

Merge many R data frames by row.names with differing lengths

We could get all the datasets into a list and use merge with Reduce specifying the by as a new column created from the row names

lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), \(x) 
transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE),
lst1)
row.names(out) <- out[[1]]
out <- out[-1]

-output

 out
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70

Or using tidyverse with full_join after creating a row names column with rownames_to_column (from tibble)

library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\\d+$')) %>%
map(~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames("rn")
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70

Merge Dataframes with different number of rows

Your dataset is,

dat1 = data.frame("Arable and Horticulture" = c(100, 90,23, 3, 56, 299), 
row.names = c("Acer", "Achillea", "Aesculus", "Alliaria", "Allium", "Anchusa"))

dat2 = data.frame("Improved Grassland" = c(12, 3, 50, 23, 299, 29),
row.names = c("Acer", "Achillea", "Allium", "Brassica", "Calystegia", "Campanula"))

As @Vinícius Félix suggested first convert rownames to column.

library(tibble)
dat1 = rownames_to_column(dat1, "Plants")
dat2 = rownames_to_column(dat2, "Plants")

Then lets join both the datasets,

library(dplyr)
dat = full_join(dat1, dat2, )

And replace the NA with 0

dat = dat %>% replace(is.na(.), 0)

Plants Arable.and.Horticulture Improved.Grassland
1 Acer 100 12
2 Achillea 90 3
3 Aesculus 23 0
4 Alliaria 3 0
5 Allium 56 50
6 Anchusa 299 0
7 Brassica 0 23
8 Calystegia 0 299
9 Campanula 0 29

merge 2 dataframes in r with same row names

With merge, we can use the by as row.names

out <- merge(df1, df2, by = 'row.names')

If we need to plot, either we can use base R barplot

barplot(`row.names<-`(as.matrix(out[-1]),
out$Row.names), col = c('blue', 'green', 'red'), legend = TRUE)

Or with tidyverse

library(ggplot2)
library(dplyr)
library(tidyr)
merge(df1, df2, by = 'row.names') %>%
rename(nm = 'Row.names') %>% # // rename the column name
type.convert(as.is = TRUE) %>% # // some columns were not of the correct type
pivot_longer(cols = -nm) %>% # // reshape to 'long' format
ggplot(aes(x = name, y = value, fill = nm)) + # // plot as bar
geom_col() +
theme_bw()

-output

Sample Image

Merge or combine by rownames

Use match to return your desired vector, then cbind it to your matrix

cbind(t, z[, "symbol"][match(rownames(t), rownames(z))])

[,1] [,2] [,3] [,4]
GO.ID "GO:0002009" "GO:0030334" "GO:0015674" NA
LEVEL "8" "6" "7" NA
Annotated "342" "343" "350" NA
Significant "1" "1" "1" NA
Expected "0.07" "0.07" "0.07" NA
resultFisher "0.679" "0.065" "0.065" NA
ILMN_1652464 "0" "0" "1" "PLAC8"
ILMN_1651838 "0" "0" "0" "RND1"
ILMN_1711311 "1" "1" "0" NA
ILMN_1653026 "0" "0" "0" "GRA"

PS. Be warned that t is base R function that is used to transpose matrices. By creating a variable called t, it can lead to confusion in your downstream code.

rbind data frames in R, possible to add more than a number to duplicated rownames?

Instead of storing important information in rownames you can have them in separate column. Use make.unique to have a unique name.

library(dplyr)
library(tibble)

res <- df1 %>%
rownames_to_column() %>%
bind_rows(df2 %>% rownames_to_column()) %>%
mutate(rowname = make.unique(rowname, sep = '_'))

res

# rowname A B
#1 A1 1 1
#2 B1 2 2
#3 C1 3 3
#4 C1_1 1 1
#5 C2 2 2
#6 C3 3 3

If you need the values back as rownames use column_to_rownames.

res %>% column_to_rownames()

# A B
#A1 1 1
#B1 2 2
#C1 3 3
#C1_1 1 1
#C2 2 2
#C3 3 3

Combine several data.frames into one (keep every rownames)

If you have all your data frames in an environment, you can get them into a named list then use package reshape2 to reshape the list. If desired, you can then set the first column as the row names.

library(reshape2)
dcast(melt(Filter(is.data.frame, mget(ls()))), L1 ~ Name)
# L1 ABIDING ABLE ABROAD ACHIEVE ACROSS ACT ACTION ACTIVISM ACTS ADDICTION ADVANCE ADVANCING ADVANTAGE
# 1 Apple 1 1 1 NA 4 2 NA NA NA NA NA NA 1
# 2 Berry NA NA NA NA NA 2 2 1 1 1 1 NA NA
# 3 Orange NA NA NA 1 3 1 1 NA NA NA 1 1 NA

Note: This assumes all your data is in the global environment and that no other data frames are present except the ones to be used here.



Related Topics



Leave a reply



Submit