How to merge two data frames on common columns in R with sum of others?
You can use ddply
in package plyr
and combine it with merge
:
library(plyr)
ddply(merge(data_A, data_B, all.x=TRUE),
.(USER_A, USER_B), summarise, ACTION=sum(ACTION))
Notice that merge
is called with the parameter all.x=TRUE
- this returns all of the values in the first data.frame passed to merge
, i.e. data_A:
USER_A USER_B ACTION
1 1 11 0.30
2 1 13 0.25
3 1 16 0.63
4 1 17 0.26
5 2 11 0.14
6 2 14 0.28
How to merge and sum two data frames
With dplyr,
library(dplyr)
# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(),
df2 %>% add_rownames()) %>%
# evaluate following calls for each value in the rowname column
group_by(rowname) %>%
# add all non-grouping variables
summarise_all(sum)
## # A tibble: 7 x 4
## rowname x y z
## <chr> <int> <int> <int>
## 1 A 1 2 3
## 2 B 2 3 4
## 3 C 4 6 8
## 4 D 6 8 10
## 5 E 8 10 12
## 6 F 4 5 6
## 7 G 5 6 7
How to sum multiple columns in two data frames in r
Here's a base R option :
tmp <- cbind(df1, df2)
data.frame(sapply(split.default(tmp, names(tmp)), rowSums))
# V1 V2 V3 V4 V5
#1 4 8 5 5 4
#2 6 10 7 7 0
data
df1 < -structure(list(V1 = 2:3, V2 = 4:5, V3 = c(5L, 7L)),
class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(V1 = 2:3, V5 = c(4L, 0L), V2 = 4:5, V4 = c(5L,
7L)), class = "data.frame", row.names = c(NA, -2L))
How to merge two data frames on common columns in R with sum of others using dplyr package
how easy is that my friend?
df1 %>% left_join(df2, key = c('var1', 'var2')) %>%
mutate(sum = var2 + var3)
Merge data frames whilst summing common columns in R
If I understand correctly, you want a flexible method that does not require knowing which columns exist in each table aside from the columns you want to merge by and the columns you want to preserve. This may not be the most elegant solution, but here is an example function to suit your exact needs:
merge_Sum <- function(.df1, .df2, .id_Columns, .match_Columns){
merged_Columns <- unique(c(names(.df1),names(.df2)))
merged_df1 <- data.frame(matrix(nrow=nrow(.df1), ncol=length(merged_Columns)))
names(merged_df1) <- merged_Columns
for (column in merged_Columns){
if(column %in% .id_Columns | !column %in% names(.df2)){
merged_df1[, column] <- .df1[, column]
} else if (!column %in% names(.df1)){
merged_df1[, column] <- .df2[match(.df1[, .match_Columns],.df2[, .match_Columns]), column]
} else {
df1_Values=.df1[, column]
df2_Values=.df2[match(.df1[, .match_Columns],.df2[, .match_Columns]), column]
df2_Values[is.na(df2_Values)] <- 0
merged_df1[, column] <- df1_Values + df2_Values
}
}
return(merged_df1)
}
This function assumes you have a table '.df1' that is a master of sorts, and you want to merge data from a second table '.df2' that has rows that match one or more of the rows in '.df1'. The columns to preserve from the master table '.df1' are accepted as an array '.id_Columns', and the columns that provide the match for merging the two tables are accepted as an array '.match_Columns'
For your example, it would work like this:
merge_Sum(table1, table2, c("Date","Time"), "Date")
# Date Time ColumnA ColumnB ColumnC
# 1 01/01/2013 08:00 110 330 1
# 2 01/01/2013 08:30 115 325 1
# 3 01/01/2013 09:00 120 320 1
# 4 02/01/2013 08:00 225 415 2
# 5 02/01/2013 08:30 230 410 2
# 6 02/01/2013 09:00 235 405 2
In plain language, this function first finds the total number of unique columns and makes an empty data frame in the shape of the master table '.df1' to later hold the merged data. Then, for the '.id_Columns', the data is copied from '.df1' into the new merged data frame. For the other columns, any data that exists in '.df1' is added to any existing data in '.df2', where the rows in '.df2' are matched based on the '.match_Columns'
There is probably some package out there that does something similar, but most of them require knowledge of all the existing columns and how to treat them. As I said before, this is not the most elegant solution, but it is flexible and accurate.
Update: The original function assumed a many-to-one relationship between table1 and table2, and the OP requested the allowance of a many-to-none relationship, also. The code has been updated with a slightly less efficient but 100% more flexible logic.
How to sum values of matching columns while merging two dataframes in r
We can place the datasets in a list
, use rbindlist
to rbind the datasets, grouped by 'ship_no', get the sum
of other columns
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)[,lapply(.SD, sum, na.rm = TRUE) , ship_no]
# ship_no bay_1 bay_2 bay_3 bay_5 bay_6 bay_7
#1: ABC 10 20 15 20 30 10
#2: DEF 20 30 0 40 20 0
#3: ERT 0 10 0 20 0 0
Another option would be dplyr
library(dplyr)
bind_rows(df1, df2) %>%
group_by(ship_no) %>%
summarise_all(funs(sum(., na.rm = TRUE)))
# A tibble: 3 x 7
# ship_no bay_1 bay_2 bay_3 bay_5 bay_6 bay_7
# <chr> <int> <int> <int> <int> <int> <int>
#1 ABC 10 20 15 20 30 10
#2 DEF 20 30 0 40 20 0
#3 ERT 0 10 0 20 0 0
How to add elements of columns shared between two dataframes in R
This should work:
overlap = intersect(names(df1), names(df2))
df1[overlap] = df1[overlap] + df2[overlap]
It assumes the data frames have the same number of rows.
R: Sum column wise value of two/more data frames having same variables (column names) and take Date column as reference
You can consider the following base R approach.
df3 <- cbind(df1[1], df1[-1] + df2[-1])
df3
Date V1 V2 V3
1 2017/01/01 3 7 11
2 2017/02/01 8 12 14
Or the dplyr
approach.
library(dplyr)
df3 <- bind_rows(df1, df2) %>%
group_by(Date) %>%
summarise_all(funs(sum))
df3
Date V1 V2 V3
<chr> <int> <int> <int>
1 2017/01/01 3 7 11
2 2017/02/01 8 12 14
Or the data.table
approach.
library(data.table)
df_bind <- rbindlist(list(df1, df2))
df3 <- df_bind[, lapply(.SD, sum), by = Date]
df3
Date V1 V2 V3
1: 2017/01/01 3 7 11
2: 2017/02/01 8 12 14
Data:
df1 <- read.table(text = "Date V1 V2 V3
'2017/01/01' 2 4 5
'2017/02/01' 3 5 7",
header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "Date V1 V2 V3
'2017/01/01' 1 3 6
'2017/02/01' 5 7 7",
header = TRUE, stringsAsFactors = FALSE)
Combine data.frames summing up values of identical columns in R
I'd use plyr
's rbind.fill
like this:
pp <- cbind(names=c(rownames(df1), rownames(df2), rownames(df3)),
rbind.fill(list(df1, df2, df3)))
# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 2 3 1 NA NA
# 2 site2 0 2 0 1 NA NA
# 3 site3 1 1 1 1 NA NA
# 4 site1 0 1 NA 2 NA NA
# 5 site2 1 2 NA 0 NA NA
# 6 site3 1 1 NA 1 NA NA
# 7 site1 0 1 NA NA 1 1
# 8 site2 1 1 NA NA 1 5
# 9 site3 2 0 NA NA 0 0
Then, aggregate with plyr's
ddply
as follows:
ddply(pp, .(names), function(x) colSums(x[,-1], na.rm = TRUE))
# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 4 3 3 1 1
# 2 site2 2 5 0 1 1 5
# 3 site3 4 2 1 2 0 0
Related Topics
Devtools::Install_Github() - Ignore Ssl Cert Verification Failure
Coloring Boxplot Outlier Points in Ggplot2
Extract First Word from a Column and Insert into New Column
New R-Studio Version 0.98.932 Deletes .Md File - How to Prevent
Convert Roman Numerals to Numbers in R
Group by and Conditionally Count
How to Produce a Heatmap with Ggplot2
Source Script to Separate Environment in R, Not the Global Environment
How to Output Text to the R Console in Color
Canonical Tidyverse Method to Update Some Values of a Vector from a Look-Up Table
Convert Quarter/Year Format to a Date
How to Download and Display an Image from an Url in R
Any Way to Force Fread() of Data.Table Not to Stop on Empty Lines