How to Sum and Combine Two Data Frames

Pandas DataFrame merge summing column

In [41]: pd.merge(df1, df2, on=['id', 'name']).set_index(['id', 'name']).sum(axis=1)
Out[41]:
id name
2 B 25
3 C 20
dtype: int64

how to merge two dataframes and sum the values of columns

I think need set_index for both DataFrames, add and last reset_index:

df = df1.set_index('Name').add(df2.set_index('Name'), fill_value=0).reset_index()
print (df)
Name class value
0 Ram 2.0 8.0
1 Sri 2.0 10.0
2 viv 7.0 8.0

If values in Name are not unique use groupby and aggregate sum:

df = df1.groupby('Name').sum().add(df2.groupby('Name').sum(), fill_value=0).reset_index()

Summing Entries in Multiple Unequally-Sized Data Frames With Some (but not All) Rows and Columns the Same

I think this should work. With row AND column names and one data type, I prefer matrices to data frames, but you can convert the final matrix back to a data frame if you need.

# put things in a list
df_list = list(df1, df2, df3)

# get the complete set of row and column names
all_rows = unique(unlist(lapply(df_list, rownames)))
all_cols = unique(unlist(lapply(df_list, colnames)))

# initialize a final matrix to NA
final_mat = matrix(NA, nrow = length(all_rows), ncol = length(all_cols))
rownames(final_mat) = all_rows
colnames(final_mat) = all_cols

# go through each df in the list
for(i in seq_along(df_list)) {
# set any NAs in the selection to 0
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])][is.na(final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])])] = 0
# add the data frame to the selection
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] = final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] + as.matrix(df_list[[i]])
}

final_mat
# A B D C E F
# row1 1 7 4 1 2 NA
# row2 2 4 5 NA NA NA
# row3 15 28 6 2 3 2
# row4 4 6 7 NA NA NA
# row5 5 13 8 3 4 NA
# row6 6 8 9 NA NA NA
# row7 7 16 10 4 5 NA
# row8 8 10 11 NA NA NA
# row9 19 27 12 NA NA 4
# row10 21 29 13 NA NA 3
# row11 NA 8 NA 5 6 NA
# row12 13 19 NA NA NA 1

Pandas- merging two dataframe by sum the values of columns and index

You can use :df1.add(df2, fill_value=0). It will add df2 into df1 also it will replace NAN value with 0.

>>> import numpy as np
>>> import pandas as pd
>>> df2 = pd.DataFrame([(10,9),(8,4),(7,np.nan)], columns=['a','b'])
>>> df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
>>> df1.add(df2, fill_value=0)

a b
0 11 11.0
1 11 8.0
2 12 6.0

Merge data frames and sum columns with the same name

One way would be:

library(dplyr)

bind_rows(df1, df2) %>%
#mutate_if(is.numeric, tidyr::replace_na, 0) %>% #in case of having NAs
group_by(country) %>%
summarise_all(., sum, na.rm = TRUE)


# # A tibble: 4 x 3
# country year1 year2
# <chr> <dbl> <dbl>
# 1 a 2 2
# 2 b 4 4
# 3 c 3 3
# 4 d 3 3

or a base r solution

aggregate(. ~ country, rbind(df1, df2), sum, na.rm = TRUE, na.action = NULL)

which would generate the same output.

How to merge and sum two data frames

With dplyr,

library(dplyr)

# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(),
df2 %>% add_rownames()) %>%
# evaluate following calls for each value in the rowname column
group_by(rowname) %>%
# add all non-grouping variables
summarise_all(sum)

## # A tibble: 7 x 4
## rowname x y z
## <chr> <int> <int> <int>
## 1 A 1 2 3
## 2 B 2 3 4
## 3 C 4 6 8
## 4 D 6 8 10
## 5 E 8 10 12
## 6 F 4 5 6
## 7 G 5 6 7

How to merge multiple data.frames and sum and average columns at the same time in R

I think your second approach is the way to go, and you can do that with data.table or dplyr.

Here a few steps using data.table. First, if your data frames are abc, def, ...
do:

DF <- do.call(rbind, list(abc,def,...))

now you can transform them into a data.table

DT <- data.table(DF)

and simply do something like

DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]

double check the data.table vignettes to get a better idea how that package work.

How to sum same column of different data frames in R

How about

df5 <- df1
df5$pred1 <- df1$pred1 + df2$pred1 + df3$pred1 + df4$pred1
df5$pred2 <- df1$pred2 + df2$pred2 + df3$pred2 + df4$pred2

Based on Gregor's suggestions, you could also create a vector to store the columns to be added (in case there are a lot), and then add those together as with

cols = c("pred1", "pred2")
df5[, cols] = df1[, cols] + df2[, cols] + df3[, cols] + df4[, cols]

akrun also provides a suggestion which I don't follow, but seems like it would work well with arbitrarily many dataframes as well (just expand 1:4 to 1:n, where n is the number of the last df).

Reduce("+", lapply(mget(paste0('df', 1:4)), "[[", c("pred1", "pred2")))

Combine multiple dataframes by summing certain columns in Pandas

One possible solution with sum if numeric values and if strings then join unique values per groups in GroupBy.agg after concat list of DataFrames:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog
1 2 8 dog

If possible different values like cat and dog:

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog,cat
1 2 8 dog

If need lists:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 [dog]

And for combination lists with scalars for nonnumeric values use custom function:

def f(x):
if np.issubdtype(x.dtype, np.number):
return x.sum()
else:
u = x.unique().tolist()
if len(u) == 1:
return u[0]
else:
return u

df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 dog

merge or combine pandas dataframes while summing up elements in common columns

combine pd.concat + groupby
This is a general approach that can accommodate any number of dataframes within the list

pd.concat(
[df1, df2], ignore_index=True
).groupby(['year', 'month'], as_index=False).sum()

year month hits outs
0 2001 01 4 4
1 2001 02 6 0
2 2001 03 10 4
3 2002 01 24 2
4 2002 02 10 0
5 2003 01 0 1
6 2003 02 0 4


Related Topics



Leave a reply



Submit