How to Calculate the Mean of Those Columns in a Data Frame with the Same Column Name

Computing row average of columns with same name in pandas

Try by level parameter:

df_mean=df.groupby(level=0,axis=1).mean()

another possible way:

df_mean=df.T.groupby(df.columns).mean().T

output of df_mean:

    a   b   c
0 2 1 3
1 5 4 4
2 8 7 5

How to average columns with the same name and ignore columns that are factors

For a base R solution by extending what you have,

df <- 
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))

char = c("Apple", "banana", "cat", "rainbow")
df <- cbind(char, df)

names(df) <- gsub('.\\d', '', grep('[a-zA-Z]', names(df), value = TRUE)) ## removes the digit from your groups

res <-
data.frame(
factor = df$char,
sapply(setdiff(unique(names(df)), 'char'), function(col)
rowMeans(df[, names(df) == col]))
)

> res
factor A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0

How to grep columns matching a pattern and calculate the row means of those columns and add the mean values as a new column to the data frame in r?

An option is to remove the digits at the end (\\d+$) with sub, use that to split the dataset into a list of data.frames, get the rowMeans and assign it to new columns in the dataset

nm1 <- sub("\\d+$", "", names(df))
df[paste0(unique(nm1), "_mean")] <- sapply(split.default(df, nm1), rowMeans)

Want to mutate columns that average columns together based on column names, but also excludes certain columns from the calculation?

In base R, you can find the columns which has 'stat' in it and one by one remove it from lapply and take row-wise mean of it.

cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df

# Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0

calculate mean of a column in a data frame when it initially is a character

Try

mean(good$V1, na.rm=TRUE)

or

colMeans(good[sapply(good, is.numeric)], 
na.rm=TRUE)

Compute mean value of rows that has the same column value in Pandas

This?

import pandas as pd

df = pd.read_excel('test.xlsx')
df1 = df.groupby(['category']).mean()
print(df)
print(df1)

output:

    C   D category
0 71 44 cat_C
1 5 88 cat_C
2 8 78 cat_C
3 31 27 cat_C
4 42 48 cat_B
5 18 18 cat_B
6 84 23 cat_A
7 94 23 cat_A

C D
category
cat_A 89.00 23.00
cat_B 30.00 33.00
cat_C 28.75 59.25

Calculate new column as the mean of other columns in pandas

an easy way to solve this problem is shown below :

col = df.loc[: , "salary_1":"salary_3"]

where "salary_1" is the start column name and "salary_3" is the end column name

df['salary_mean'] = col.mean(axis=1)
df

This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.

Calculate mean for selected rows for selected columns in pandas data frame

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

   a  b  c
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3

to select only the 3d and fifth row you can do:

df.iloc[[2,4]]

which returns:

   a  b  c
5 1 2 3
7 1 2 3

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]]

which yields:

   b  c
5 2 3
7 2 3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

b    2
c 3

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list()
for l in L:
dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T


Related Topics



Leave a reply



Submit