Merge Data Frames Based on Rownames in R

Merge data frames based on rownames in R

See ?merge:

the name "row.names" or the number 0 specifies the row names.

Example:

R> de <- merge(d, e, by=0, all=TRUE)  # merge by row names (by=0 or by="row.names")
R> de[is.na(de)] <- 0 # replace NA values
R> de
Row.names a b c d e f g h i j k l m n o p q r s
1 1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19
2 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0 0 0 0 0 0 0
3 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 21 22 23 24 25 26 27 28 29
t
1 20
2 0
3 30

Merge or combine by rownames

Use match to return your desired vector, then cbind it to your matrix

cbind(t, z[, "symbol"][match(rownames(t), rownames(z))])

[,1] [,2] [,3] [,4]
GO.ID "GO:0002009" "GO:0030334" "GO:0015674" NA
LEVEL "8" "6" "7" NA
Annotated "342" "343" "350" NA
Significant "1" "1" "1" NA
Expected "0.07" "0.07" "0.07" NA
resultFisher "0.679" "0.065" "0.065" NA
ILMN_1652464 "0" "0" "1" "PLAC8"
ILMN_1651838 "0" "0" "0" "RND1"
ILMN_1711311 "1" "1" "0" NA
ILMN_1653026 "0" "0" "0" "GRA"

PS. Be warned that t is base R function that is used to transpose matrices. By creating a variable called t, it can lead to confusion in your downstream code.

Merge Multiple Data Frames by Row Names

Merging by row.names does weird things - it creates a column called Row.names, which makes subsequent merges hard.

To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table instead):

Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))

How to merge/left_join multiple data frames based on row names

You will need to specify your join columns in by

Reduce(function(x, y) merge(x, y, all=TRUE, by="rn", suffixes=c("", ".2")), 
lapply(list(dat1, dat2, dat3),
function(x) data.frame(x, rn = row.names(x))))

# rn Xxx Yyy Aaa Rrr Aaa.2 Ggg
#1 A 0.033 0.23300000 0.1 0.100 0.20 0.20
#2 B 0.066 0.03333333 0.0 0.033 NA NA
#3 C NA NA NA NA 0.02 0.03

Merging more than 2 dataframes in R by rownames

Three lines of code will give you the exact same result:

dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)

Ah I see, now I understand why you are getting into so much pain. Using the old for loop surely does the trick. Maybe there are even more clever solutions

rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}

Sum data from two data frames matched by rowname

You can merge the two dataframes by rownames and then add the corresponding columns

transform(merge(df1, df2, by = 0), sum = Data1 + Data2)


# Row.names Data1 Data2 sum
#1 2019-03-01 0.011 0.033 0.044
#2 2019-04-01 0.021 0.017 0.038
#3 2019-05-01 0.013 0.055 0.068
#4 2019-06-01 0.032 0.032 0.064
#5 2019-07-01 NA 0.029 NA

Or similarly with dplyr

library(dplyr)
library(tibble)

inner_join(df1 %>% rownames_to_column(),
df2 %>% rownames_to_column(), by = "rowname") %>%
mutate(Result = Data1 + Data2)

merging rows within dataframe based on row.names

Assuming your data.frame is called "SODF", create a vector from the row.names that strips out the "t+some digit" from the end of the row.names and use that as your aggregation variable.

> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001 28 43 33 25 64
2 Nasvi2EG000002 0 3 0 0 4
3 Nasvi2EG000004 1 0 0 0 0
4 Nasvi2EG000009 0 4 2 0 4
5 Nasvi2EG000013 21 8 17 19 7
6 Nasvi2EG000014 0 7 0 0 7

merging data frames based on multiple nearest matches in R

Without knowing exactly how you want the result formatted, you can do this with the data.table rolling join with roll="nearest" that you mentioned.

In this case I've melted both sets of data to long datasets so that the matching can be done in a single join.

library(data.table)
setDT(df1)
setDT(df2)

df1[
match(
melt(df1, id.vars="julian")[
melt(df2, measure.vars=names(df2)),
on=c("variable","value"), roll="nearest"]$julian,
julian),
]
# julian a b c d
#1: 9 12.02948 13.54714 7.659482 6.784113
#2: 20 28.74620 20.24871 18.523935 17.801711
#3: 10 13.00511 14.57352 8.296155 6.942622
#4: 24 30.26931 24.20554 20.253149 22.017714

If you want separate tables for each join instead you could do something like:

lapply(names(df2), \(var)  df1[df2, on=var, roll="nearest", .SD, .SDcols=names(df1)] )


Related Topics



Leave a reply



Submit