How to Merge Two Data Frames on Common Columns in R with Sum of Others

How to merge two data frames on common columns in R with sum of others?

You can use ddply in package plyr and combine it with merge:

library(plyr)
ddply(merge(data_A, data_B, all.x=TRUE),
.(USER_A, USER_B), summarise, ACTION=sum(ACTION))

Notice that merge is called with the parameter all.x=TRUE - this returns all of the values in the first data.frame passed to merge, i.e. data_A:

  USER_A USER_B ACTION
1 1 11 0.30
2 1 13 0.25
3 1 16 0.63
4 1 17 0.26
5 2 11 0.14
6 2 14 0.28

How to merge and sum two data frames

With dplyr,

library(dplyr)

# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(),
df2 %>% add_rownames()) %>%
# evaluate following calls for each value in the rowname column
group_by(rowname) %>%
# add all non-grouping variables
summarise_all(sum)

## # A tibble: 7 x 4
## rowname x y z
## <chr> <int> <int> <int>
## 1 A 1 2 3
## 2 B 2 3 4
## 3 C 4 6 8
## 4 D 6 8 10
## 5 E 8 10 12
## 6 F 4 5 6
## 7 G 5 6 7

How to sum multiple columns in two data frames in r

Here's a base R option :

tmp <- cbind(df1, df2)
data.frame(sapply(split.default(tmp, names(tmp)), rowSums))

# V1 V2 V3 V4 V5
#1 4 8 5 5 4
#2 6 10 7 7 0

data

df1 < -structure(list(V1 = 2:3, V2 = 4:5, V3 = c(5L, 7L)), 
class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(V1 = 2:3, V5 = c(4L, 0L), V2 = 4:5, V4 = c(5L,
7L)), class = "data.frame", row.names = c(NA, -2L))

How to merge two data frames on common columns in R with sum of others using dplyr package

how easy is that my friend?

df1 %>% left_join(df2, key = c('var1', 'var2')) %>%
mutate(sum = var2 + var3)

Merge data frames whilst summing common columns in R

If I understand correctly, you want a flexible method that does not require knowing which columns exist in each table aside from the columns you want to merge by and the columns you want to preserve. This may not be the most elegant solution, but here is an example function to suit your exact needs:

merge_Sum <- function(.df1, .df2, .id_Columns, .match_Columns){
merged_Columns <- unique(c(names(.df1),names(.df2)))
merged_df1 <- data.frame(matrix(nrow=nrow(.df1), ncol=length(merged_Columns)))
names(merged_df1) <- merged_Columns
for (column in merged_Columns){
if(column %in% .id_Columns | !column %in% names(.df2)){
merged_df1[, column] <- .df1[, column]
} else if (!column %in% names(.df1)){
merged_df1[, column] <- .df2[match(.df1[, .match_Columns],.df2[, .match_Columns]), column]
} else {
df1_Values=.df1[, column]
df2_Values=.df2[match(.df1[, .match_Columns],.df2[, .match_Columns]), column]
df2_Values[is.na(df2_Values)] <- 0
merged_df1[, column] <- df1_Values + df2_Values
}
}
return(merged_df1)
}

This function assumes you have a table '.df1' that is a master of sorts, and you want to merge data from a second table '.df2' that has rows that match one or more of the rows in '.df1'. The columns to preserve from the master table '.df1' are accepted as an array '.id_Columns', and the columns that provide the match for merging the two tables are accepted as an array '.match_Columns'

For your example, it would work like this:

merge_Sum(table1, table2, c("Date","Time"), "Date")

# Date Time ColumnA ColumnB ColumnC
# 1 01/01/2013 08:00 110 330 1
# 2 01/01/2013 08:30 115 325 1
# 3 01/01/2013 09:00 120 320 1
# 4 02/01/2013 08:00 225 415 2
# 5 02/01/2013 08:30 230 410 2
# 6 02/01/2013 09:00 235 405 2

In plain language, this function first finds the total number of unique columns and makes an empty data frame in the shape of the master table '.df1' to later hold the merged data. Then, for the '.id_Columns', the data is copied from '.df1' into the new merged data frame. For the other columns, any data that exists in '.df1' is added to any existing data in '.df2', where the rows in '.df2' are matched based on the '.match_Columns'

There is probably some package out there that does something similar, but most of them require knowledge of all the existing columns and how to treat them. As I said before, this is not the most elegant solution, but it is flexible and accurate.

Update: The original function assumed a many-to-one relationship between table1 and table2, and the OP requested the allowance of a many-to-none relationship, also. The code has been updated with a slightly less efficient but 100% more flexible logic.

How to sum values of matching columns while merging two dataframes in r

We can place the datasets in a list, use rbindlist to rbind the datasets, grouped by 'ship_no', get the sum of other columns

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)[,lapply(.SD, sum, na.rm = TRUE) , ship_no]
# ship_no bay_1 bay_2 bay_3 bay_5 bay_6 bay_7
#1: ABC 10 20 15 20 30 10
#2: DEF 20 30 0 40 20 0
#3: ERT 0 10 0 20 0 0

Another option would be dplyr

library(dplyr)
bind_rows(df1, df2) %>%
group_by(ship_no) %>%
summarise_all(funs(sum(., na.rm = TRUE)))
# A tibble: 3 x 7
# ship_no bay_1 bay_2 bay_3 bay_5 bay_6 bay_7
# <chr> <int> <int> <int> <int> <int> <int>
#1 ABC 10 20 15 20 30 10
#2 DEF 20 30 0 40 20 0
#3 ERT 0 10 0 20 0 0

How to add elements of columns shared between two dataframes in R

This should work:

overlap = intersect(names(df1), names(df2))
df1[overlap] = df1[overlap] + df2[overlap]

It assumes the data frames have the same number of rows.

R: Sum column wise value of two/more data frames having same variables (column names) and take Date column as reference

You can consider the following base R approach.

df3 <- cbind(df1[1], df1[-1] + df2[-1])
df3
Date V1 V2 V3
1 2017/01/01 3 7 11
2 2017/02/01 8 12 14

Or the dplyr approach.

library(dplyr)
df3 <- bind_rows(df1, df2) %>%
group_by(Date) %>%
summarise_all(funs(sum))
df3
Date V1 V2 V3
<chr> <int> <int> <int>
1 2017/01/01 3 7 11
2 2017/02/01 8 12 14

Or the data.table approach.

library(data.table)
df_bind <- rbindlist(list(df1, df2))
df3 <- df_bind[, lapply(.SD, sum), by = Date]
df3
Date V1 V2 V3
1: 2017/01/01 3 7 11
2: 2017/02/01 8 12 14

Data:

df1 <- read.table(text = "Date    V1    V2    V3  
'2017/01/01' 2 4 5
'2017/02/01' 3 5 7",
header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "Date V1 V2 V3
'2017/01/01' 1 3 6
'2017/02/01' 5 7 7",
header = TRUE, stringsAsFactors = FALSE)

Combine data.frames summing up values of identical columns in R

I'd use plyr's rbind.fill like this:

pp <- cbind(names=c(rownames(df1), rownames(df2), rownames(df3)), 
rbind.fill(list(df1, df2, df3)))

# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 2 3 1 NA NA
# 2 site2 0 2 0 1 NA NA
# 3 site3 1 1 1 1 NA NA
# 4 site1 0 1 NA 2 NA NA
# 5 site2 1 2 NA 0 NA NA
# 6 site3 1 1 NA 1 NA NA
# 7 site1 0 1 NA NA 1 1
# 8 site2 1 1 NA NA 1 5
# 9 site3 2 0 NA NA 0 0

Then, aggregate with plyr's ddply as follows:

ddply(pp, .(names), function(x) colSums(x[,-1], na.rm = TRUE))
# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 4 3 3 1 1
# 2 site2 2 5 0 1 1 5
# 3 site3 4 2 1 2 0 0


Related Topics



Leave a reply



Submit