Mean of a Column in a Data Frame, Given the Column's Name

calculate mean of a column in a data frame when it initially is a character

Try

mean(good$V1, na.rm=TRUE)

or

colMeans(good[sapply(good, is.numeric)], 
na.rm=TRUE)

Find the mean of columns with matching column names

An option is to groupby level=0:

(df.set_index(['name','x','y'])
.groupby(level=0, axis=1)
.mean().reset_index()
)

Output:

    name  x  y  ghb_00hr  ghl_06hr
0 gene1 x y 2.333333 2.0
1 gene2 x y 6.000000 1.5

Update: for the modified question:

d = df.filter(like='gh')
# or d = df.iloc[:, 2:]
# depending on your columns of interest

names = d.columns.str.rsplit('_', n=1).str[0]

d.groupby(names, axis=1).mean()

Output:

   ghb_00hr  ghl_06hr
0 2.333333 2.0
1 6.000000 1.5

pandas get column average/mean

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

Calculate mean of selected columns with multilevel header

This is only one solution:

import pandas as pd

iterables = [[1, 2, 3, 4], ["x", "y"]]
array = [
[1, 4, 3, 7, 2, 1, 5, 2],
[2, 2, 6, 1, 4, 5, 1, 7]
]
index = pd.MultiIndex.from_product(iterables)
df = pd.DataFrame(array, index=["A", "B"], columns=index)

df["mean"] = df.xs("x", level=1, axis=1).loc[:,1:3].mean(axis=1)

print(df)

1 2 3 4 mean
x y x y x y x y
A 1 4 3 7 2 1 5 2 2.0
B 2 2 6 1 4 5 1 7 4.0

Steps:

  1. Select all the "x"-columns with df.xs("x", level=1, axis=1)
  2. Select only columns 1 to 3 with .loc[:,1:3]
  3. Calculate the mean value with .mean(axis=1)

How to average columns with the same name and ignore columns that are factors

For a base R solution by extending what you have,

df <- 
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))

char = c("Apple", "banana", "cat", "rainbow")
df <- cbind(char, df)

names(df) <- gsub('.\\d', '', grep('[a-zA-Z]', names(df), value = TRUE)) ## removes the digit from your groups

res <-
data.frame(
factor = df$char,
sapply(setdiff(unique(names(df)), 'char'), function(col)
rowMeans(df[, names(df) == col]))
)

> res
factor A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0


Related Topics



Leave a reply



Submit