Merge data frames based on rownames in R
See ?merge
:
the name "row.names" or the number 0 specifies the row names.
Example:
R> de <- merge(d, e, by=0, all=TRUE) # merge by row names (by=0 or by="row.names")
R> de[is.na(de)] <- 0 # replace NA values
R> de
Row.names a b c d e f g h i j k l m n o p q r s
1 1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19
2 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0 0 0 0 0 0 0
3 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 21 22 23 24 25 26 27 28 29
t
1 20
2 0
3 30
Merge or combine by rownames
Use match
to return your desired vector, then cbind
it to your matrix
cbind(t, z[, "symbol"][match(rownames(t), rownames(z))])
[,1] [,2] [,3] [,4]
GO.ID "GO:0002009" "GO:0030334" "GO:0015674" NA
LEVEL "8" "6" "7" NA
Annotated "342" "343" "350" NA
Significant "1" "1" "1" NA
Expected "0.07" "0.07" "0.07" NA
resultFisher "0.679" "0.065" "0.065" NA
ILMN_1652464 "0" "0" "1" "PLAC8"
ILMN_1651838 "0" "0" "0" "RND1"
ILMN_1711311 "1" "1" "0" NA
ILMN_1653026 "0" "0" "0" "GRA"
PS. Be warned that t
is base R function that is used to transpose matrices. By creating a variable called t, it can lead to confusion in your downstream code.
Merge Multiple Data Frames by Row Names
Merging by row.names
does weird things - it creates a column called Row.names, which makes subsequent merges hard.
To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table
instead):
Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))
How to merge/left_join multiple data frames based on row names
You will need to specify your join columns in by
Reduce(function(x, y) merge(x, y, all=TRUE, by="rn", suffixes=c("", ".2")),
lapply(list(dat1, dat2, dat3),
function(x) data.frame(x, rn = row.names(x))))
# rn Xxx Yyy Aaa Rrr Aaa.2 Ggg
#1 A 0.033 0.23300000 0.1 0.100 0.20 0.20
#2 B 0.066 0.03333333 0.0 0.033 NA NA
#3 C NA NA NA NA 0.02 0.03
Merging more than 2 dataframes in R by rownames
Three lines of code will give you the exact same result:
dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)
Ah I see, now I understand why you are getting into so much pain. Using the old for
loop surely does the trick. Maybe there are even more clever solutions
rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}
Sum data from two data frames matched by rowname
You can merge
the two dataframes by rownames and then add the corresponding columns
transform(merge(df1, df2, by = 0), sum = Data1 + Data2)
# Row.names Data1 Data2 sum
#1 2019-03-01 0.011 0.033 0.044
#2 2019-04-01 0.021 0.017 0.038
#3 2019-05-01 0.013 0.055 0.068
#4 2019-06-01 0.032 0.032 0.064
#5 2019-07-01 NA 0.029 NA
Or similarly with dplyr
library(dplyr)
library(tibble)
inner_join(df1 %>% rownames_to_column(),
df2 %>% rownames_to_column(), by = "rowname") %>%
mutate(Result = Data1 + Data2)
merging rows within dataframe based on row.names
Assuming your data.frame
is called "SODF", create a vector from the row.names
that strips out the "t+some digit" from the end of the row.names
and use that as your aggregation variable.
> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001 28 43 33 25 64
2 Nasvi2EG000002 0 3 0 0 4
3 Nasvi2EG000004 1 0 0 0 0
4 Nasvi2EG000009 0 4 2 0 4
5 Nasvi2EG000013 21 8 17 19 7
6 Nasvi2EG000014 0 7 0 0 7
merging data frames based on multiple nearest matches in R
Without knowing exactly how you want the result formatted, you can do this with the data.table rolling join with roll="nearest"
that you mentioned.
In this case I've melt
ed both sets of data to long datasets so that the matching can be done in a single join.
library(data.table)
setDT(df1)
setDT(df2)
df1[
match(
melt(df1, id.vars="julian")[
melt(df2, measure.vars=names(df2)),
on=c("variable","value"), roll="nearest"]$julian,
julian),
]
# julian a b c d
#1: 9 12.02948 13.54714 7.659482 6.784113
#2: 20 28.74620 20.24871 18.523935 17.801711
#3: 10 13.00511 14.57352 8.296155 6.942622
#4: 24 30.26931 24.20554 20.253149 22.017714
If you want separate tables for each join instead you could do something like:
lapply(names(df2), \(var) df1[df2, on=var, roll="nearest", .SD, .SDcols=names(df1)] )
Related Topics
Techniques for Finding Near Duplicate Records
Create Dataframe from a Matrix
Get_Map Not Passing the API Key (Http Status Was '403 Forbidden')
Dplyr/R Cumulative Sum with Reset
How to Reorder Data.Table Columns (Without Copying)
How to Tell Cran to Install Package Dependencies Automatically
Replace Missing Values (Na) with Blank (Empty String)
Rbind Data Frames Based on a Common Pattern in Data Frame Name
Plot Data in Descending Order as Appears in Data Frame
Pretty Ticks for Log Normal Scale Using Ggplot2 (Dynamic Not Manual)
Ggmap Error: Geomrasterann Was Built with an Incompatible Version of Ggproto
Struggling with Integers (Maximum Integer Size)
Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels
What's the Difference Between Integer Class and Numeric Class in R
Remove All Line Breaks (Enter Symbols) from the String Using R