pandas get column average/mean
If you only want the mean of the weight
column, select the column (which is a Series) and call .mean()
:
In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120
In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007
How to average column values every n rows in pandas
IIUC, DataFrame.melt
+ mean for each site with GroupBy.mean
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()
Or:
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated
Output
site
1 3.333333
2 7.333333
Name: value, dtype: float64
compute column average based on conditions pandas
You can use .groupby()
and .mean()
, followed by rename column by .rename()
, as follows:
df2 = df.groupby(['names', 'subject'], as_index=False)['value'].mean().rename({'value': 'average'}, axis=1)
Result:
print(df2)
names subject average
0 A X 10.000000
1 A Y 15.666667
2 B P 12.250000
3 B Q 10.000000
Calculating mean of column based on the occurence of a number in another column Pandas dataframe Python
Try this
df[df['s1']==5]['s2'].mean()
pandas get column average for rows with a certain value?
Use pandas.core.groupby.GroupBy.mean
:
df.groupby("city")["timeDiff"].mean()
how to get the average of dataframe column values
Simply using df.mean()
will Do The Right Thing(tm) with respect to NaNs:
>>> df
A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 226 1733
2013-05-06 NaN 4064
2013-05-07 NaN 41151
2013-05-08 NaN 8144
2013-05-09 NaN 23
2013-05-10 NaN 10
>>> df.mean(axis=1)
DATE
2013-05-01 272205.0
2013-05-02 48786.0
2013-05-03 14054.0
2013-05-04 843.5
2013-05-05 979.5
2013-05-06 4064.0
2013-05-07 41151.0
2013-05-08 8144.0
2013-05-09 23.0
2013-05-10 10.0
dtype: float64
You can use df[["A", "B"]].mean(axis=1)
if there are other columns to ignore.
How to calculate mean of specific rows in python dataframe?
You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...
groupby
is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):
df.groupby('TagName')['Sample_value'].mean().reset_index()
it gives as expected:
TagName Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05
Details on the magic words:
groupby
: identifies the column(s) used to group the rows (same values)['Sample_values']
: restrict the groupby object to the column of interestmean()
: computes the mean per groupreset_index()
: by default the grouping columns go into the index, which is fine for the mean operation.reset_index
make them back normal columns
Compute row average in pandas
You can specify a new column. You also need to compute the mean along the rows, so use axis=1
.
df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518
For a column in pandas dataframe, calculate mean of column values in previous 4th, 8th and 12th row from the present row?
.shift()
is your missing part. We can use it to access previous rows from the existing row in a Pandas dataframe.
Let's use .groupby()
, .apply()
and .shift()
as follows:
df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)
Here, rows are partitioned into groups of 13 rows by grouping them under different group numbers set by (df['Row number'] - 1) // 13
Then within each group, we use .apply()
on the column Existing column
and use .shift()
to get the previous 4th, 8th and 12th entries within the group.
Test Run
data = {'Row number' : np.arange(1, 40), 'Existing column': np.arange(11, 50) }
df = pd.DataFrame(data)
print(df)
Row number Existing column
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
8 9 19
9 10 20
10 11 21
11 12 22
12 13 23
13 14 24
14 15 25
15 16 26
16 17 27
17 18 28
18 19 29
19 20 30
20 21 31
21 22 32
22 23 33
23 24 34
24 25 35
25 26 36
26 27 37
27 28 38
28 29 39
29 30 40
30 31 41
31 32 42
32 33 43
33 34 44
34 35 45
35 36 46
36 37 47
37 38 48
38 39 49
df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)
print(df)
Row number Existing column New column
0 1 11 NaN
1 2 12 NaN
2 3 13 NaN
3 4 14 NaN
4 5 15 NaN
5 6 16 NaN
6 7 17 NaN
7 8 18 NaN
8 9 19 NaN
9 10 20 NaN
10 11 21 NaN
11 12 22 NaN
12 13 23 15.0
13 14 24 NaN
14 15 25 NaN
15 16 26 NaN
16 17 27 NaN
17 18 28 NaN
18 19 29 NaN
19 20 30 NaN
20 21 31 NaN
21 22 32 NaN
22 23 33 NaN
23 24 34 NaN
24 25 35 NaN
25 26 36 28.0
26 27 37 NaN
27 28 38 NaN
28 29 39 NaN
29 30 40 NaN
30 31 41 NaN
31 32 42 NaN
32 33 43 NaN
33 34 44 NaN
34 35 45 NaN
35 36 46 NaN
36 37 47 NaN
37 38 48 NaN
38 39 49 41.0
Related Topics
What Does Blazeds Livecycle Data Services Do, That Something Like Pyamf or Rubyamf Not Do
Separate a Row of Strings into Separate Rows
Is There a Python Equivalent for Rspec to Do Tdd
Simple File Server to Serve Current Directory
Python, Ruby, Haskell - Do They Provide True Multithreading
Looking for Recommendation on How to Convert PDF into Structured Format
In Python Can One Implement Mixin Behavior Without Using Inheritance
Ruby Equivalent to Python's Help()
Is There a Function That Checks If a Character in a String Is a Letter in the Alphabet? (Swift)
Function Which Returns the Least-Squares Solution to a Linear Matrix Equation
(Z3Py) Checking All Solutions for Equation
Error Message: 'Chromedriver' Executable Needs to Be Path
Pyinstaller Unable to Access Data Folder
Python Beautifulsoup Iframe Document HTML Extract
Merge Pandas Dataframes Where One Value Is Between Two Others