Get the Mean Across Multiple Pandas Dataframes

Get the mean across multiple Pandas DataFrames

Assuming the two dataframes have the same columns, you could just concatenate them and compute your summary stats on the concatenated frames:

import numpy as np
import pandas as pd

# some random data frames
df1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))
df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))

# concatenate them
df_concat = pd.concat((df1, df2))

print df_concat.mean()
# x -0.163044
# y 2.120000
# dtype: float64

print df_concat.median()
# x -0.192037
# y 2.000000
# dtype: float64

Update

If you want to compute stats across each set of rows with the same index in the two datasets, you can use .groupby() to group the data by row index, then apply the mean, median etc.:

by_row_index = df_concat.groupby(df_concat.index)
df_means = by_row_index.mean()

print df_means.head()
# x y
# 0 -0.850794 1.5
# 1 0.159038 1.5
# 2 0.083278 1.0
# 3 -0.540336 0.5
# 4 0.390954 3.5

This method will work even when your dataframes have unequal numbers of rows - if a particular row index is missing in one of the two dataframes, the mean/median will be computed on the single existing row.

Compute Average/Mean across Dataframes in Python Pandas

Perhaps I misunderstood what you asked

The solution is simple. You just need to concat along the correct axis

dummy data

df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ])
df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])

ps. this should be your job as OP

pd.concat

df_concat0 = pd.concat((df1, df2), axis=1)

puts all the dataframes next to eachother.

    0   1   0   1
0 10 10 0 1
1 10 11 1 2
2 10 12 2 3

If we want to do a groupby now, we first need to stack, groupby and stack again

df_concat0.stack().groupby(level=[0,1]).mean().unstack()

    0   1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5

If we do

df_concat = pd.concat((df1, df2))

This puts all the dataframes on top of eachother

    0   1
0 10 10
1 10 11
2 10 12
0 0 1
1 1 2
2 2 3

now we need to just groupby the index, like you did

df_concat.groupby(level=0).mean()

    0   1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5

and then use ExcelWriter as context manager

with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
result.to_excel(writer)

or just plain

result.to_excel(filepath, engine='openpyxl') 

if you can overwrite what is is filepath

Average of multiple dataframes with the same columns and indices

You can use groupby.mean on the index level after concatenating the dataframes:

pd.concat([v1, v2, v3]).groupby(level=0).mean()

c1 c2 c3
id
ind1 1.333333 2.333333 2.666667
ind2 3.666667 2.333333 3.666667

Obtaining mean from multiple dataframes and inserting result into new columns

Fortunately, pandas provides ways to aggregate your data without having to construct for-loops. (If you do need to work with individual rows, pd.DataFrame().iterrows() is a series generator,)

The approach is to combine the data, groupby their name, then calculate each mean.

First, let's create some data to work with...

name1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8],[9,0]], columns=['col1', 'col2'])
name2 = name1 * 2
name3 = name1 + 3

name1['Name'] = 'name1'
name2['Name'] = 'name2'
name3['Name'] = 'name3'

df = pd.concat([name1, name2, name3])

now we use pandas aggregation

df.groupby('Name').mean()

col1 col2
Name
name1 5 4
name2 10 8
name3 8 7

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Mean and standard deviation with multiple dataframes

Use concat with remove D in DataFrame.query and aggregate by GroupBy.agg with named aggregations:

df = (pd.concat([df1, df2, df3])
.query('ID != "D"')
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
avg std
ID
A 5 3.605551
B 1 1.000000
C 2 1.000000

Or remove D in last step by DataFrame.drop:

df = (pd.concat([df1, df2, df3])
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
.drop('D'))

Weighted average across multiple dataframes

If you really want to disregard the string column, and you are certain the two df are the same shape, then you can do this:

sel = ['b', 'c']  # numeric columns
df3 = df1.copy()
df3[sel] = 2/3 * df1[sel] + 1/3 * df2[sel]

On your data, df3 is:

       a    b         c
0 hello 2.0 1.333333
1 hello 1.0 1.000000

However, in the more general case, you may have different sizes and your a column may be relevant. Here is an example:

df1 = pd.DataFrame([["hello", 2, 1], ["world", 1, 1]], columns=["a", "b", "c"])
df2 = pd.DataFrame([["world", 2, 2], ["hello", 1, 1]], columns=["a", "b", "c"])

(2/3 * df1.set_index('a').stack() +
1/3 * df1.set_index('a').stack()).groupby(level=[0,1]).mean().unstack().reset_index()

# gives:
a b c
0 hello 2.0 1.0
1 world 1.0 1.0

How do I average two data frames in pandas?

I think need concat with aggregate mean, if first 4 columns, what is necessary if duplicates rows in first 4 columns in df1 or df2:

df = pd.concat([df1, df2]).groupby(df.columns.tolist()[:4]).mean()

If not, use set_index with add and divide by 2:

a = df1.set_index(df.columns.tolist()[:4])
b = df1.set_index(df.columns.tolist()[:4])
c = a.add(b).div(2).reset_index()

Group pandas dataframe and calculate mean for multiple columns

df.groupby("category", as_index=False).mean()

finding the average and std across multiple Pandas series

You need join them together by concat and use DataFrame.agg:

s1 = pd.Series(range(3))
s2 = pd.Series([4,5,7])
s3 = pd.Series([7,5,2])

L = [s1, s2, s3]

df = pd.concat(L, axis=1).agg(['mean','std'], axis=1)
print (df)
mean std
0 3.666667 3.511885
1 3.666667 2.309401
2 3.666667 2.886751


print (df['mean'])
0 3.666667
1 3.666667
2 3.666667
Name: mean, dtype: float64

print (df['std'])
0 3.511885
1 2.309401
2 2.886751
Name: std, dtype: float64

Pandas Mean Across Two Data Frames on Similar Columns only

Use DataFrame.add using Key as the indexes:

df1.set_index('Key').add(df2.set_index('Key')).dropna(axis=1) / 2

A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6

Alternative with concat + groupby.

pd.concat([df1, df2], axis=0).dropna(axis=1).groupby('Key').mean()

A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6


Related Topics



Leave a reply



Submit