How to Get the Average (Mean) of Selected Columns

How can I get the average (mean) of selected columns

Here are some examples:

> z$mean <- rowMeans(subset(z, select = c(x, y)), na.rm = TRUE)
> z
w x y mean
1 5 1 1 1
2 6 2 2 2
3 7 3 3 3
4 8 4 NA 4

weighted mean

> z$y <- rev(z$y)
> z
w x y mean
1 5 1 NA 1
2 6 2 3 2
3 7 3 2 3
4 8 4 1 4
>
> weight <- c(1, 2) # x * 1/3 + y * 2/3
> z$wmean <- apply(subset(z, select = c(x, y)), 1, function(d) weighted.mean(d, weight, na.rm = TRUE))
> z
w x y mean wmean
1 5 1 NA 1 1.000000
2 6 2 3 2 2.666667
3 7 3 2 3 2.333333
4 8 4 1 4 2.000000

How to calculate row mean from selected columns

Just subset each row by their means in respective rows w before calculating their means.

w <- c("01-01-2018", "02-01-2018", "03-01-2018")  ## define columns

apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1] 3.40 2.75 4.90 -0.10 1.15

Another way is to replace data points that don't exceed the row means with NA's before calculating rowMeans. This is about 30 times faster.

rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1] 3.40 2.75 4.90 -0.10 1.15

Data:

data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1, 
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1,
0.3)), class = "data.frame", row.names = c(NA, -5L))

Get mean of multiple selected columns in a pandas dataframe

You have two options that I know of:

for mean(), min(), max() you can use mean of mean, min of min, max of max this would yield, mean, min, max of all the elements of A, C, E.

So you can use:
for mean():enter code here

df1[['A','C','E']].apply(np.mean).mean()
df1[['A','C','E']].values.mean()

Any one of the above should give you the mean of all the elements of columns A, C, E.

for min():

df1[['A','C','E']].apply(np.min).min()
df1[['A','C','E']].values.min()

For max():

df1[['A','C','E']].apply(np.max).max()
df1[['A','C','E']].values.max()

For std()

df1[['A','C','E']].apply(np.std).std()    ##  this will not give error, but gives a 
value that is not what you want.
df1[['A','C','E']].values.std() # this gives the std of all the elements of columns A, C, E.

std of std will not give the std of all the elements.

Calculate mean for selected rows for selected columns in pandas data frame

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

   a  b  c
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3

to select only the 3d and fifth row you can do:

df.iloc[[2,4]]

which returns:

   a  b  c
5 1 2 3
7 1 2 3

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]]

which yields:

   b  c
5 2 3
7 2 3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

b    2
c 3

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list()
for l in L:
dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

Calculating new column as mean of selected columns in R data frame

Since you wanted rowwise mean, this will work:

dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])

How can I get the average (mean) of selected columns and impute the NA's

One option is na.aggregate from zoo to impute the missing values (NA) with the mean value of that column. We loop through the selected columns of dataset (lapply(df1[4:8], .), apply the function and then update the columns on the lhs of <-

library(zoo)
df1[4:8] <- lapply(df1[4:8], na.aggregate)

If we need the median, use the FUN as median (by default it is mean)

df1[4:8] <- lapply(df1[4:8], na.aggregate, FUN = median)

Create mean column for specific columns depending on group in R

library(tidyverse)

tribble(
~group, ~first, ~second, ~third,
0, 3, 2, 4,
0, 0, NA, 5,
0, 2, 7, 1,
1, 3, 1, 6,
1, 4, 0, NA,
1, 2, 3, 3,
0, 5, 5, 0,
0, 6, 2, 2,
1, NA, 1, 3
) |>
rowwise() |>
mutate(mean = if_else(group == 0, mean(c_across(c(first, second)), na.rm = TRUE),
mean(c_across(c(first, third)), na.rm = TRUE)))

#> # A tibble: 9 × 5
#> # Rowwise:
#> group first second third mean
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 3 2 4 2.5
#> 2 0 0 NA 5 0
#> 3 0 2 7 1 4.5
#> 4 1 3 1 6 4.5
#> 5 1 4 0 NA 4
#> 6 1 2 3 3 2.5
#> 7 0 5 5 0 5
#> 8 0 6 2 2 4
#> 9 1 NA 1 3 3

Created on 2022-06-08 by the reprex package (v2.0.1)

pandas get column average/mean

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

Append one row with average values for selected columns and counting percent for another based on conditions

What about tibble::add_row:

df %>% 
add_row(name = "total",
status = as.character(mean(df$status[df$status != "-"] == "Pass")),
real = mean(df$real),
pred1 = mean(df$pred1, na.rm = T),
pred2 = mean(df$pred2, na.rm = T))

name status real pred1 pred2
1 A Pass 10 50.00 12.0
2 B Fail NA 20.00 12.0
3 C - 8 NA 8.0
4 D Pass 9 14.00 NA
5 E Pass 4 11.00 6.0
6 total 0.75 NA 23.75 9.5

Explanation of as.character(mean(df$status[df$status != "-"] == "Pass")):

  • df$status[df$status != "-"] is the vector of df$status without the element equal to "-" (so only Pass and Fail).
  • df$status[df$status != "-"] == "Pass" is TRUE if df$status is "Pass", FALSE otherwise.
  • mean(...) is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.
  • as.character(...) is needed because df$status is a character variable.


Related Topics



Leave a reply



Submit