How to Calculate the Mean of a Column

pandas get column average/mean

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

calculate mean of a column in a data frame when it initially is a character

Try

mean(good$V1, na.rm=TRUE)

or

colMeans(good[sapply(good, is.numeric)], 
na.rm=TRUE)

How do I calculate the mean of a column

Awk:

awk '{ total += $2 } END { print total/NR }' yourFile.whatever

Read as:

  • For each line, add column 2 to a variable 'total'.
  • At the end of the file, print 'total' divided by the number of records.

During the calculation of mean of a column in dataframe that contain missing values

According to the official documentation of pandas.DataFrame.mean "skipna" parameter excludes the NA/null values. If it was excluded from numerator but denominator this would be exclusively mentioned in the documentation. You could prove yourself that it is excluded from denominator by performing a simple experimentation with a dummy dataframe such as the one you have examplified in the question.

The reason NA/null values should be excluded from denominator is about being statistically correct. Mean is the sum of the numbers divided by total number of them. If you could not add a value to the summation, then it is pointless to make an extra count in the denominator for it. If you count it in the denominator, it equals behaving as though the NA/null value was 0. However, the value is not 0, it is unknown, unobserved, hidden etc.

If you are acknowledged about the nature of the distribution in practice, you could interpolate or fill NA/null values accordingly with the nature of the distribution, then take the mean of all the values. For instance, if you realize that the feature in question has a linear nature, you could interpolate missing values with "linear" approach.

Calculate the mean in pandas while a column has a string

You can replace "None" with numpy.nan, instead that using 0.

Something like this should do the trick:

import numpy as np
dur_temp = duration.replace("None", np.nan)
descricao_duration = dur_temp.mean()

How to calculate mean of column, then paste mean value as row value in another data frame in R?

We may get the datasets in a list, bind the datasets, create a 'Year' column from the named list, do a group by mean

library(dplyr)
library(stringr)
lst(`1980_df`, `1981_df`, `1982_df`) %>%
bind_rows(.id = 'Year') %>%
group_by(Year = str_remove(Year, '_df')) %>%
summarise(Avg_bottom_temp = mean(bottom_temp))

-output

# A tibble: 3 × 2
Year Avg_bottom_temp
<chr> <dbl>
1 1980 11.6
2 1981 11.9
3 1982 11.6

data

`1980_df` <- structure(list(lon = c(-75.61, -75.6, -75.59, -75.58), lat = c(39.1, 
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.5, 11.6, 11.7)), class = "data.frame", row.names = c(NA,
-4L))
`1981_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.9, 11.9, 12, 11.8)), class = "data.frame", row.names = c(NA,
-4L))
`1982_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.7, 11.9, 11.2)), class = "data.frame", row.names = c(NA,
-4L))

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05

Details on the magic words:

  • groupby: identifies the column(s) used to group the rows (same values)
  • ['Sample_values']: restrict the groupby object to the column of interest
  • mean(): computes the mean per group
  • reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns


Related Topics



Leave a reply



Submit