How can I get the average (mean) of selected columns
Here are some examples:
> z$mean <- rowMeans(subset(z, select = c(x, y)), na.rm = TRUE)
> z
w x y mean
1 5 1 1 1
2 6 2 2 2
3 7 3 3 3
4 8 4 NA 4
weighted mean
> z$y <- rev(z$y)
> z
w x y mean
1 5 1 NA 1
2 6 2 3 2
3 7 3 2 3
4 8 4 1 4
>
> weight <- c(1, 2) # x * 1/3 + y * 2/3
> z$wmean <- apply(subset(z, select = c(x, y)), 1, function(d) weighted.mean(d, weight, na.rm = TRUE))
> z
w x y mean wmean
1 5 1 NA 1 1.000000
2 6 2 3 2 2.666667
3 7 3 2 3 2.333333
4 8 4 1 4 2.000000
How to calculate row mean from selected columns
Just subset each row by their means in respective rows w
before calculating their means.
w <- c("01-01-2018", "02-01-2018", "03-01-2018") ## define columns
apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1] 3.40 2.75 4.90 -0.10 1.15
Another way is to replace
data points that don't exceed the row means with NA's
before calculating rowMeans
. This is about 30 times faster.
rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1] 3.40 2.75 4.90 -0.10 1.15
Data:
data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1,
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1,
0.3)), class = "data.frame", row.names = c(NA, -5L))
Get mean of multiple selected columns in a pandas dataframe
You have two options that I know of:
for mean(), min(), max() you can use mean of mean, min of min, max of max this would yield, mean, min, max of all the elements of A, C, E.
So you can use:
for mean():enter code here
df1[['A','C','E']].apply(np.mean).mean()
df1[['A','C','E']].values.mean()
Any one of the above should give you the mean of all the elements of columns A, C, E.
for min():
df1[['A','C','E']].apply(np.min).min()
df1[['A','C','E']].values.min()
For max():
df1[['A','C','E']].apply(np.max).max()
df1[['A','C','E']].values.max()
For std()
df1[['A','C','E']].apply(np.std).std() ## this will not give error, but gives a
value that is not what you want.
df1[['A','C','E']].values.std() # this gives the std of all the elements of columns A, C, E.
std of std will not give the std of all the elements.
Calculate mean for selected rows for selected columns in pandas data frame
To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.
For example:
df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])
gives the following dataframe:
a b c
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3
to select only the 3d and fifth row you can do:
df.iloc[[2,4]]
which returns:
a b c
5 1 2 3
7 1 2 3
if you then want to select only columns b and c you use the following command:
df[['b', 'c']].iloc[[2,4]]
which yields:
b c
5 2 3
7 2 3
To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1
thus:
df[['b', 'c']].iloc[[2,4]].mean(axis=0)
returns:
b 2
c 3
As we should expect from the input dataframe.
For your code you can then do:
df[column_list].iloc[row_index_list].mean(axis=0)
EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?
Answer:
If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':
first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.
dfs = list()
for l in L:
dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))
mean_matrix = pd.concat(dfs, axis=1).T
Calculating new column as mean of selected columns in R data frame
Since you wanted rowwise mean, this will work:
dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])
How can I get the average (mean) of selected columns and impute the NA's
One option is na.aggregate
from zoo
to impute the missing values (NA) with the mean
value of that column. We loop through the selected columns of dataset (lapply(df1[4:8], .
), apply the function and then update the columns on the lhs of <-
library(zoo)
df1[4:8] <- lapply(df1[4:8], na.aggregate)
If we need the median
, use the FUN
as median
(by default it is mean
)
df1[4:8] <- lapply(df1[4:8], na.aggregate, FUN = median)
Create mean column for specific columns depending on group in R
library(tidyverse)
tribble(
~group, ~first, ~second, ~third,
0, 3, 2, 4,
0, 0, NA, 5,
0, 2, 7, 1,
1, 3, 1, 6,
1, 4, 0, NA,
1, 2, 3, 3,
0, 5, 5, 0,
0, 6, 2, 2,
1, NA, 1, 3
) |>
rowwise() |>
mutate(mean = if_else(group == 0, mean(c_across(c(first, second)), na.rm = TRUE),
mean(c_across(c(first, third)), na.rm = TRUE)))
#> # A tibble: 9 × 5
#> # Rowwise:
#> group first second third mean
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 3 2 4 2.5
#> 2 0 0 NA 5 0
#> 3 0 2 7 1 4.5
#> 4 1 3 1 6 4.5
#> 5 1 4 0 NA 4
#> 6 1 2 3 3 2.5
#> 7 0 5 5 0 5
#> 8 0 6 2 2 4
#> 9 1 NA 1 3 3
Created on 2022-06-08 by the reprex package (v2.0.1)
pandas get column average/mean
If you only want the mean of the weight
column, select the column (which is a Series) and call .mean()
:
In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120
In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007
Append one row with average values for selected columns and counting percent for another based on conditions
What about tibble::add_row
:
df %>%
add_row(name = "total",
status = as.character(mean(df$status[df$status != "-"] == "Pass")),
real = mean(df$real),
pred1 = mean(df$pred1, na.rm = T),
pred2 = mean(df$pred2, na.rm = T))
name status real pred1 pred2
1 A Pass 10 50.00 12.0
2 B Fail NA 20.00 12.0
3 C - 8 NA 8.0
4 D Pass 9 14.00 NA
5 E Pass 4 11.00 6.0
6 total 0.75 NA 23.75 9.5
Explanation of as.character(mean(df$status[df$status != "-"] == "Pass"))
:
df$status[df$status != "-"]
is the vector ofdf$status
without the element equal to"-"
(so onlyPass
andFail
).df$status[df$status != "-"] == "Pass"
isTRUE
ifdf$status
is"Pass"
,FALSE
otherwise.mean(...)
is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.as.character(...)
is needed becausedf$status
is a character variable.
Related Topics
Error ".Onload Failed in Loadnamespace() for 'Tcltk'"
Mean of Each Element of a List of Matrices
Line Break When No Data in Ggplot2
Generate Random Numbers with Fixed Mean and Sd
Return Index from a Vector of the Value Closest to a Given Element
Why Is Apply() Method Slower Than a for Loop in R
How to Get a Reversed, Log10 Scale in Ggplot2
Combining Bar and Line Chart (Double Axis) in Ggplot2
Count Number of Columns by a Condition (>) for Each Row
Animated Sorted Bar Chart with Bars Overtaking Each Other
Insert Picture/Table in R Markdown
Add Error Bars to Show Standard Deviation on a Plot in R
How to Place Grobs with Annotation_Custom() at Precise Areas of the Plot Region
Fast Pairwise Simple Linear Regression Between Variables in a Data Frame
Non-Standard Evaluation (Nse) in Dplyr's Filter_ & Pulling Data from MySQL
Why True == "True" Is True in R
Using Multiple Criteria in Subset Function and Logical Operators