Pandas DataFrame merge summing column
In [41]: pd.merge(df1, df2, on=['id', 'name']).set_index(['id', 'name']).sum(axis=1)
Out[41]:
id name
2 B 25
3 C 20
dtype: int64
how to merge two dataframes and sum the values of columns
I think need set_index
for both DataFrame
s, add
and last reset_index
:
df = df1.set_index('Name').add(df2.set_index('Name'), fill_value=0).reset_index()
print (df)
Name class value
0 Ram 2.0 8.0
1 Sri 2.0 10.0
2 viv 7.0 8.0
If values in Name
are not unique use groupby
and aggregate sum
:
df = df1.groupby('Name').sum().add(df2.groupby('Name').sum(), fill_value=0).reset_index()
Summing Entries in Multiple Unequally-Sized Data Frames With Some (but not All) Rows and Columns the Same
I think this should work. With row AND column names and one data type, I prefer matrices to data frames, but you can convert the final matrix back to a data frame if you need.
# put things in a list
df_list = list(df1, df2, df3)
# get the complete set of row and column names
all_rows = unique(unlist(lapply(df_list, rownames)))
all_cols = unique(unlist(lapply(df_list, colnames)))
# initialize a final matrix to NA
final_mat = matrix(NA, nrow = length(all_rows), ncol = length(all_cols))
rownames(final_mat) = all_rows
colnames(final_mat) = all_cols
# go through each df in the list
for(i in seq_along(df_list)) {
# set any NAs in the selection to 0
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])][is.na(final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])])] = 0
# add the data frame to the selection
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] = final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] + as.matrix(df_list[[i]])
}
final_mat
# A B D C E F
# row1 1 7 4 1 2 NA
# row2 2 4 5 NA NA NA
# row3 15 28 6 2 3 2
# row4 4 6 7 NA NA NA
# row5 5 13 8 3 4 NA
# row6 6 8 9 NA NA NA
# row7 7 16 10 4 5 NA
# row8 8 10 11 NA NA NA
# row9 19 27 12 NA NA 4
# row10 21 29 13 NA NA 3
# row11 NA 8 NA 5 6 NA
# row12 13 19 NA NA NA 1
Pandas- merging two dataframe by sum the values of columns and index
You can use :df1.add(df2, fill_value=0)
. It will add df2
into df1
also it will replace NAN
value with 0
.
>>> import numpy as np
>>> import pandas as pd
>>> df2 = pd.DataFrame([(10,9),(8,4),(7,np.nan)], columns=['a','b'])
>>> df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
>>> df1.add(df2, fill_value=0)
a b
0 11 11.0
1 11 8.0
2 12 6.0
Merge data frames and sum columns with the same name
One way would be:
library(dplyr)
bind_rows(df1, df2) %>%
#mutate_if(is.numeric, tidyr::replace_na, 0) %>% #in case of having NAs
group_by(country) %>%
summarise_all(., sum, na.rm = TRUE)
# # A tibble: 4 x 3
# country year1 year2
# <chr> <dbl> <dbl>
# 1 a 2 2
# 2 b 4 4
# 3 c 3 3
# 4 d 3 3
or a base r solution
aggregate(. ~ country, rbind(df1, df2), sum, na.rm = TRUE, na.action = NULL)
which would generate the same output.
How to merge and sum two data frames
With dplyr,
library(dplyr)
# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(),
df2 %>% add_rownames()) %>%
# evaluate following calls for each value in the rowname column
group_by(rowname) %>%
# add all non-grouping variables
summarise_all(sum)
## # A tibble: 7 x 4
## rowname x y z
## <chr> <int> <int> <int>
## 1 A 1 2 3
## 2 B 2 3 4
## 3 C 4 6 8
## 4 D 6 8 10
## 5 E 8 10 12
## 6 F 4 5 6
## 7 G 5 6 7
How to merge multiple data.frames and sum and average columns at the same time in R
I think your second approach is the way to go, and you can do that with data.table
or dplyr
.
Here a few steps using data.table
. First, if your data frames are abc
, def
, ...
do:
DF <- do.call(rbind, list(abc,def,...))
now you can transform them into a data.table
DT <- data.table(DF)
and simply do something like
DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]
double check the data.table
vignettes to get a better idea how that package work.
How to sum same column of different data frames in R
How about
df5 <- df1
df5$pred1 <- df1$pred1 + df2$pred1 + df3$pred1 + df4$pred1
df5$pred2 <- df1$pred2 + df2$pred2 + df3$pred2 + df4$pred2
Based on Gregor's suggestions, you could also create a vector to store the columns to be added (in case there are a lot), and then add those together as with
cols = c("pred1", "pred2")
df5[, cols] = df1[, cols] + df2[, cols] + df3[, cols] + df4[, cols]
akrun also provides a suggestion which I don't follow, but seems like it would work well with arbitrarily many dataframes as well (just expand 1:4 to 1:n, where n is the number of the last df).
Reduce("+", lapply(mget(paste0('df', 1:4)), "[[", c("pred1", "pred2")))
Combine multiple dataframes by summing certain columns in Pandas
One possible solution with sum
if numeric values and if strings then join unique values per groups in GroupBy.agg
after concat
list of DataFrame
s:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog
1 2 8 dog
If possible different values like cat
and dog
:
df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog,cat
1 2 8 dog
If need lists:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 [dog]
And for combination lists with scalars for nonnumeric values use custom function:
def f(x):
if np.issubdtype(x.dtype, np.number):
return x.sum()
else:
u = x.unique().tolist()
if len(u) == 1:
return u[0]
else:
return u
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 dog
merge or combine pandas dataframes while summing up elements in common columns
combine pd.concat
+ groupby
This is a general approach that can accommodate any number of dataframes within the list
pd.concat(
[df1, df2], ignore_index=True
).groupby(['year', 'month'], as_index=False).sum()
year month hits outs
0 2001 01 4 4
1 2001 02 6 0
2 2001 03 10 4
3 2002 01 24 2
4 2002 02 10 0
5 2003 01 0 1
6 2003 02 0 4
Related Topics
How to Get Rows, by Group, of Data Frame with Earliest Timestamp
Confidence Intervals for Predictions from Logistic Regression
How to Apply Function Over Each Matrix Element's Indices
Difference Between As.Data.Frame(X) and Data.Frame(X)
Automating Version Increase of R Packages
R "Stats" Citation for a Scientific Paper
Error: Could Not Find Function "Unit"
How to Modify This Correlation Matrix Plot
Comparing Two Columns in a Data Frame Across Many Rows
How to Combine Aes() and Aes_String() Options
Separate Columns with Constant Numbers and Condense Them to One Row in R Data.Frame
Remove All Variables Except Functions
How to Check Existence of an Input Argument for R Functions
Avoid Rbind()/Cbind() Conversion from Numeric to Factor
How to Append a Plot to an Existing PDF File
Error Calling Serialize R Function