Get the mean across multiple Pandas DataFrames
Assuming the two dataframes have the same columns, you could just concatenate them and compute your summary stats on the concatenated frames:
import numpy as np
import pandas as pd
# some random data frames
df1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))
df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))
# concatenate them
df_concat = pd.concat((df1, df2))
print df_concat.mean()
# x -0.163044
# y 2.120000
# dtype: float64
print df_concat.median()
# x -0.192037
# y 2.000000
# dtype: float64
Update
If you want to compute stats across each set of rows with the same index in the two datasets, you can use .groupby()
to group the data by row index, then apply the mean, median etc.:
by_row_index = df_concat.groupby(df_concat.index)
df_means = by_row_index.mean()
print df_means.head()
# x y
# 0 -0.850794 1.5
# 1 0.159038 1.5
# 2 0.083278 1.0
# 3 -0.540336 0.5
# 4 0.390954 3.5
This method will work even when your dataframes have unequal numbers of rows - if a particular row index is missing in one of the two dataframes, the mean/median will be computed on the single existing row.
Compute Average/Mean across Dataframes in Python Pandas
Perhaps I misunderstood what you asked
The solution is simple. You just need to concat along the correct axis
dummy data
df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ])
df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])
ps. this should be your job as OP
pd.concat
df_concat0 = pd.concat((df1, df2), axis=1)
puts all the dataframes next to eachother.
0 1 0 1
0 10 10 0 1
1 10 11 1 2
2 10 12 2 3
If we want to do a groupby now, we first need to stack, groupby and stack again
df_concat0.stack().groupby(level=[0,1]).mean().unstack()
0 1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5
If we do
df_concat = pd.concat((df1, df2))
This puts all the dataframes on top of eachother
0 1
0 10 10
1 10 11
2 10 12
0 0 1
1 1 2
2 2 3
now we need to just groupby the index, like you did
df_concat.groupby(level=0).mean()
0 1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5
and then use ExcelWriter
as context manager
with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
result.to_excel(writer)
or just plain
result.to_excel(filepath, engine='openpyxl')
if you can overwrite what is is filepath
Average of multiple dataframes with the same columns and indices
You can use groupby.mean
on the index
level after concatenating the dataframes:
pd.concat([v1, v2, v3]).groupby(level=0).mean()
c1 c2 c3
id
ind1 1.333333 2.333333 2.666667
ind2 3.666667 2.333333 3.666667
Obtaining mean from multiple dataframes and inserting result into new columns
Fortunately, pandas provides ways to aggregate your data without having to construct for-loops. (If you do need to work with individual rows, pd.DataFrame().iterrows() is a series generator,)
The approach is to combine the data, groupby their name, then calculate each mean.
First, let's create some data to work with...
name1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8],[9,0]], columns=['col1', 'col2'])
name2 = name1 * 2
name3 = name1 + 3
name1['Name'] = 'name1'
name2['Name'] = 'name2'
name3['Name'] = 'name3'
df = pd.concat([name1, name2, name3])
now we use pandas aggregation
df.groupby('Name').mean()
col1 col2
Name
name1 5 4
name2 10 8
name3 8 7
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
Mean and standard deviation with multiple dataframes
Use concat
with remove D
in DataFrame.query
and aggregate by GroupBy.agg
with named aggregations:
df = (pd.concat([df1, df2, df3])
.query('ID != "D"')
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
avg std
ID
A 5 3.605551
B 1 1.000000
C 2 1.000000
Or remove D
in last step by DataFrame.drop
:
df = (pd.concat([df1, df2, df3])
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
.drop('D'))
Weighted average across multiple dataframes
If you really want to disregard the string column, and you are certain the two df
are the same shape, then you can do this:
sel = ['b', 'c'] # numeric columns
df3 = df1.copy()
df3[sel] = 2/3 * df1[sel] + 1/3 * df2[sel]
On your data, df3
is:
a b c
0 hello 2.0 1.333333
1 hello 1.0 1.000000
However, in the more general case, you may have different sizes and your a
column may be relevant. Here is an example:
df1 = pd.DataFrame([["hello", 2, 1], ["world", 1, 1]], columns=["a", "b", "c"])
df2 = pd.DataFrame([["world", 2, 2], ["hello", 1, 1]], columns=["a", "b", "c"])
(2/3 * df1.set_index('a').stack() +
1/3 * df1.set_index('a').stack()).groupby(level=[0,1]).mean().unstack().reset_index()
# gives:
a b c
0 hello 2.0 1.0
1 world 1.0 1.0
How do I average two data frames in pandas?
I think need concat
with aggregate mean
, if first 4 columns, what is necessary if duplicates rows in first 4 columns in df1
or df2
:
df = pd.concat([df1, df2]).groupby(df.columns.tolist()[:4]).mean()
If not, use set_index
with add
and divide by 2
:
a = df1.set_index(df.columns.tolist()[:4])
b = df1.set_index(df.columns.tolist()[:4])
c = a.add(b).div(2).reset_index()
Group pandas dataframe and calculate mean for multiple columns
df.groupby("category", as_index=False).mean()
finding the average and std across multiple Pandas series
You need join them together by concat
and use DataFrame.agg
:
s1 = pd.Series(range(3))
s2 = pd.Series([4,5,7])
s3 = pd.Series([7,5,2])
L = [s1, s2, s3]
df = pd.concat(L, axis=1).agg(['mean','std'], axis=1)
print (df)
mean std
0 3.666667 3.511885
1 3.666667 2.309401
2 3.666667 2.886751
print (df['mean'])
0 3.666667
1 3.666667
2 3.666667
Name: mean, dtype: float64
print (df['std'])
0 3.511885
1 2.309401
2 2.886751
Name: std, dtype: float64
Pandas Mean Across Two Data Frames on Similar Columns only
Use DataFrame.add
using Key
as the indexes:
df1.set_index('Key').add(df2.set_index('Key')).dropna(axis=1) / 2
A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6
Alternative with concat
+ groupby
.
pd.concat([df1, df2], axis=0).dropna(axis=1).groupby('Key').mean()
A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6
Related Topics
A Better Way Than Looping and Calling Functions That Loop and Call Another Functions
Python - How to Check If Table Exists
How to Ignore Null Byte When Reading a CSV File
Convert a Standard Python Key Value Dictionary List to Pyspark Data Frame
Python Pip Install Error [Ssl: Certificate_Verify_Failed]
Python SQL Select With Possible Null Values
How to Insert Text At Line and Column Position in a File
Cannot Find Reference 'Xxx' in _Init_.Py
How to Obtain Second and Fourth Word from Each Line in a File
Splitting Dataframe into Multiple Dataframes
Split String in a Spark Dataframe Column by Regular Expressions Capturing Groups
Is There a Memory Efficient and Fast Way to Load Big Json Files
How to Get Rid of the B-Prefix in a String in Python
How to Limit Iterations of a Loop in Python
How to Calculate a Gaussian Kernel Matrix Efficiently in Numpy