Computing row average of columns with same name in pandas
Try by level parameter:
df_mean=df.groupby(level=0,axis=1).mean()
another possible way:
df_mean=df.T.groupby(df.columns).mean().T
output of df_mean
:
a b c
0 2 1 3
1 5 4 4
2 8 7 5
How to average columns with the same name and ignore columns that are factors
For a base R solution by extending what you have,
df <-
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))
char = c("Apple", "banana", "cat", "rainbow")
df <- cbind(char, df)
names(df) <- gsub('.\\d', '', grep('[a-zA-Z]', names(df), value = TRUE)) ## removes the digit from your groups
res <-
data.frame(
factor = df$char,
sapply(setdiff(unique(names(df)), 'char'), function(col)
rowMeans(df[, names(df) == col]))
)
> res
factor A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0
How to grep columns matching a pattern and calculate the row means of those columns and add the mean values as a new column to the data frame in r?
An option is to remove the digits at the end (\\d+$
) with sub
, use that to split
the dataset into a list
of data.frame
s, get the rowMeans
and assign it to new columns in the dataset
nm1 <- sub("\\d+$", "", names(df))
df[paste0(unique(nm1), "_mean")] <- sapply(split.default(df, nm1), rowMeans)
Want to mutate columns that average columns together based on column names, but also excludes certain columns from the calculation?
In base R, you can find the columns which has 'stat'
in it and one by one remove it from lapply
and take row-wise mean of it.
cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df
# Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
calculate mean of a column in a data frame when it initially is a character
Try
mean(good$V1, na.rm=TRUE)
or
colMeans(good[sapply(good, is.numeric)],
na.rm=TRUE)
Compute mean value of rows that has the same column value in Pandas
This?
import pandas as pd
df = pd.read_excel('test.xlsx')
df1 = df.groupby(['category']).mean()
print(df)
print(df1)
output:
C D category
0 71 44 cat_C
1 5 88 cat_C
2 8 78 cat_C
3 31 27 cat_C
4 42 48 cat_B
5 18 18 cat_B
6 84 23 cat_A
7 94 23 cat_A
C D
category
cat_A 89.00 23.00
cat_B 30.00 33.00
cat_C 28.75 59.25
Calculate new column as the mean of other columns in pandas
an easy way to solve this problem is shown below :
col = df.loc[: , "salary_1":"salary_3"]
where "salary_1" is the start column name and "salary_3" is the end column name
df['salary_mean'] = col.mean(axis=1)
df
This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.
Calculate mean for selected rows for selected columns in pandas data frame
To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.
For example:
df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])
gives the following dataframe:
a b c
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3
to select only the 3d and fifth row you can do:
df.iloc[[2,4]]
which returns:
a b c
5 1 2 3
7 1 2 3
if you then want to select only columns b and c you use the following command:
df[['b', 'c']].iloc[[2,4]]
which yields:
b c
5 2 3
7 2 3
To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1
thus:
df[['b', 'c']].iloc[[2,4]].mean(axis=0)
returns:
b 2
c 3
As we should expect from the input dataframe.
For your code you can then do:
df[column_list].iloc[row_index_list].mean(axis=0)
EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?
Answer:
If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':
first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.
dfs = list()
for l in L:
dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))
mean_matrix = pd.concat(dfs, axis=1).T
Related Topics
Making Plot Functions with Ggplot and Aes_String
Why Is Subsetting on a "Logical" Type Slower Than Subsetting on "Numeric" Type
R - Faster Way to Calculate Rolling Statistics Over a Variable Interval
How to Manually Change the Key Labels in a Legend in Ggplot2
Asymmetric Expansion of Ggplot Axis Limits
Warning in Install.Packages: Unable to Move Temporary Installation
Convert Accented Characters into Ascii Character
Devtools::Install_Github() - Ignore Ssl Cert Verification Failure
Text-Mining with the Tm-Package - Word Stemming
Configuration Failed Because Libcurl Was Not Found
Move Nas to the End of Each Column in a Data Frame
Extract the Coefficients for the Best Tuning Parameters of a Glmnet Model in Caret
Finding Euclidean Distance in R{Spatstat} Between Points, Confined by an Irregular Polygon Window