compute column average based on conditions pandas
You can use .groupby()
and .mean()
, followed by rename column by .rename()
, as follows:
df2 = df.groupby(['names', 'subject'], as_index=False)['value'].mean().rename({'value': 'average'}, axis=1)
Result:
print(df2)
names subject average
0 A X 10.000000
1 A Y 15.666667
2 B P 12.250000
3 B Q 10.000000
How to calculate the average of a column where the row meets a certain condition in Pandas
Simply use groupby
+ agg
:
agg = df.groupby('number')['time'].agg(['count', 'mean']).reset_index()
Output:
>>> agg
number count mean
0 1 5 37.4
1 2 3 26.0
2 4 4 30.5
3 6 2 53.0
Average certain columns based on values in other columns
Something very relevant (supporting int as column names)- https://github.com/theislab/anndata/issues/31
Due to this bug/issue, I converted the column names to type string:
test_df = pd.DataFrame({'1':[1600,1600,1600,1700,1800],'2':[1500,2000,1400,1500,2000],
'3':[2000,2000,2000,2000,2000],'51':[65,80,75,80,75],'52':[63,82,85,85,75],'53':
[83,80,75,76,78]})
Created a new dataframe - new_df to meet out requirements
new_df = test_df[['1', '2', '3']].where(test_df[['1','2','3']]<1700).notnull()
new_df now looks like this
1 2 3
0 True True False
1 True False False
2 True True False
3 False True False
4 False False False
Then simply rename the column and check using 'where'
new_df = new_df.rename(columns={"1": "51", "2":"52", "3":"53"})
test_df['mean_value'] = test_df[['51', '52', '53']].where(new_df).mean(axis=1)
This should give you the desired output -
1 2 3 51 52 53 mean_value
0 1600 1500 2000 65 63 83 64.0
1 1600 2000 2000 80 82 80 80.0
2 1600 1400 2000 75 85 75 80.0
3 1700 1500 2000 80 85 76 85.0
4 1800 2000 2000 75 75 78 NaN
Calculate the average of sections of a column with condition met to create new dataframe
Keywords: groupby
, shift
, mean
Code:
df_result=df.groupby((df['B'].shift(1,fill_value=0)!= df['B']).cumsum()).mean()
df_result=df_result[df_result['B']!=0]
df_result
A B
1 2.0 1.0
3 3.0 1.0
As you might noticed, you need first to determine the consecutive rows blocks having the same values.
One way to do so is by shifting B one row and then comparing it with itself.
df['B_shifted']=df['B'].shift(1,fill_value=0) # fill_value=0 to return int and replace Nan with 0's
df['A'] =[2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
df['B'] =[0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df['B_shifted'] =[0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]
(df['B_shifted'] != df['B'])=[F, T, F, F, T, F, T, F, F, T, F]
[↑ ][↑ ][↑ ][↑ ]
Now we can use the groupby pandas method as follows:
df_grouped=df.groupby((df['B_shifted'] != df['B']).cumsum())
Now if we looped in the DtaFrameGroupBy object df_grouped
we'll see the following tuples:
(0, A B B_shifted
0 2 0 0)
(1, A B B_shifted
1 3 1 0
2 1 1 1
3 2 1 1)
(2, A B B_shifted
4 4 0 1
5 1 0 0)
(3, A B B_shifted
6 5 1 0
7 3 1 1
8 1 1 1)
(4, A B B_shifted
9 7 0 1
10 5 0 0)
We can simply calculate the mean and filter the zero values now as follow
df_result=df_grouped.mean()
df_result=df_result[df_result['B']!=0][['A','B']]
References:(link, link).
Pandas - Calculate average of columns with condition based on values in other columns
If your columns are in a similar range for both '_a' and '_c', you can simply loop through them;
r = range(1,4)
for i in r:
df.loc[df["{}_a".format(i)] != 1, "{}_c".format(i)] = np.NaN
df['NEW'] = df[['{}_c'.format(i) for i in r]].mean(axis=1)
How to average certain values of a column based on other columns condition in pandas
You probably want groupby
and transform
(though I'm not sure in your desired output why type B
for 01/02/2010
is 13.5
, I think it should be 18.5
, i.e. the average of 17 and 20):
df['Value2'] = df.groupby(['Type','Date']).Value.transform('mean')
>>> df
Index Date Type Value Value2
0 0 01/01/2010 A 10 11.0
1 1 01/01/2010 B 15 20.0
2 2 01/01/2010 B 25 20.0
3 3 01/01/2010 A 12 11.0
4 4 01/02/2010 A 9 8.5
5 5 01/02/2010 B 17 18.5
6 6 01/02/2010 B 20 18.5
7 7 01/02/2010 A 8 8.5
Pandas calculate mean using another column as condition
You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:
t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time
grp = df.groupby(df["observation_time"].dt.time)
result = (
grp["temperature"].mean() # Calculate the mean temperature for each 30-min period
.mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
.reindex(t) # Make sure we have all periods of a day
.reset_index()
)
Python column mean based on other column conditions
You can try with this.
df[df['count']>0]['pay'].mean()
#10.515
Group by columns under conditions to calculate average
Use DataFrame.pivot_table
with helper column new
by copy like ColB
, then flatten MultiIndex
and add ouput to new DataFrame created by aggregate sum
:
df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2
How to calculate average of all rows by excluding a few columns based on a condition
Here's one way using melt
. The idea is, we filter the relevant part of the DataFrame and melt
it; then filter out the customers with all 0 answers to a question (build a filter transforming all
by groupby.transform
); then for the remaining customers, find the mean
response using groupby.mean
. Finally, assign these means back to an out
DataFrame:
out = df.loc[:, ~df.columns.str.endswith('_0')].copy()
df1 = out.melt('Customer_id')
df1 = df1.join(df1.pop('variable').str.split('_', expand=True))
percentages = (df1.loc[~df1['value'].eq(0).groupby([df1['Customer_id'], df1[0]]).transform('all')]
.groupby([ 0, 1])['value'].mean() * 100)
percentages.index = percentages.index.map('_'.join)
out.loc[len(out)] = percentages
out.loc[len(out)-1, 'Customer_id'] = 'Average'
Output:
Customer_id q1_a q1_b q2_a q2_b q2_c
0 asdsd 0.000000 0.000000 0.000000 0.000000 1.000000
1 aasww 1.000000 0.000000 0.000000 1.000000 0.000000
2 aaswe 0.000000 1.000000 0.000000 0.000000 0.000000
3 aaswt 0.000000 1.000000 1.000000 0.000000 0.000000
4 Average 33.333333 66.666667 33.333333 33.333333 33.333333
Related Topics
How to Compare Two Image Files Contents in Python
Collect_List by Preserving Order Based on Another Variable
Get the Mean Across Multiple Pandas Dataframes
How to Create a Common Function to Execute a Python Script in Jenkins
Spark Add New Column With Value Form Previous Some Columns
How to Insert Text At Line and Column Position in a File
Print 5 Items in a Row on Separate Lines for a List
How to Insert a Word into a List in Python
How to Compile Multiple Python Files into Single .Exe File Using Pyinstaller
Split Datetime Column into a Date and Time Python
Missing 1 Required Positional Argument - Issue
Permissionerror: [Errno 13] Permission Denied Flask.Run()
Webscraping Financial Data from Morningstar
How to Plot in Real-Time in a While Loop Using Matplotlib
Index Out of Bounds Error:Python