calculate mean of a column in a data frame when it initially is a character
Try
mean(good$V1, na.rm=TRUE)
or
colMeans(good[sapply(good, is.numeric)],
na.rm=TRUE)
Find the mean of columns with matching column names
An option is to groupby level=0
:
(df.set_index(['name','x','y'])
.groupby(level=0, axis=1)
.mean().reset_index()
)
Output:
name x y ghb_00hr ghl_06hr
0 gene1 x y 2.333333 2.0
1 gene2 x y 6.000000 1.5
Update: for the modified question:
d = df.filter(like='gh')
# or d = df.iloc[:, 2:]
# depending on your columns of interest
names = d.columns.str.rsplit('_', n=1).str[0]
d.groupby(names, axis=1).mean()
Output:
ghb_00hr ghl_06hr
0 2.333333 2.0
1 6.000000 1.5
pandas get column average/mean
If you only want the mean of the weight
column, select the column (which is a Series) and call .mean()
:
In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120
In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007
Calculate mean of selected columns with multilevel header
This is only one solution:
import pandas as pd
iterables = [[1, 2, 3, 4], ["x", "y"]]
array = [
[1, 4, 3, 7, 2, 1, 5, 2],
[2, 2, 6, 1, 4, 5, 1, 7]
]
index = pd.MultiIndex.from_product(iterables)
df = pd.DataFrame(array, index=["A", "B"], columns=index)
df["mean"] = df.xs("x", level=1, axis=1).loc[:,1:3].mean(axis=1)
print(df)
1 2 3 4 mean
x y x y x y x y
A 1 4 3 7 2 1 5 2 2.0
B 2 2 6 1 4 5 1 7 4.0
Steps:
- Select all the "x"-columns with
df.xs("x", level=1, axis=1)
- Select only columns 1 to 3 with
.loc[:,1:3]
- Calculate the mean value with
.mean(axis=1)
How to average columns with the same name and ignore columns that are factors
For a base R solution by extending what you have,
df <-
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))
char = c("Apple", "banana", "cat", "rainbow")
df <- cbind(char, df)
names(df) <- gsub('.\\d', '', grep('[a-zA-Z]', names(df), value = TRUE)) ## removes the digit from your groups
res <-
data.frame(
factor = df$char,
sapply(setdiff(unique(names(df)), 'char'), function(col)
rowMeans(df[, names(df) == col]))
)
> res
factor A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0
Related Topics
Why Does Merge Result in More Rows Than Original Data
Plotting a 3D Surface Plot with Contour Map Overlay, Using R
Duplicate 'Row.Names' Are Not Allowed Error
Add a Row by Reference at the End of a Data.Table Object
Can't Download Data from Yahoo Finance Using Quantmod in R
Identify All Objects of Given Class for Further Processing
Reshape from Long to Wide and Create Columns with Binary Value
Poly() in Lm(): Difference Between Raw VS. Orthogonal
Align Multiple Tables Side by Side
Cluster One-Dimensional Data Optimally
Handling Dates When We Switch to Daylight Savings Time and Back
Dplyr on Data.Table, am I Really Using Data.Table
Legend Placement, Ggplot, Relative to Plotting Region
R Shiny Rest API Communication
Formatting Reactive Data.Frames in Shiny
Format for Ordinal Dates (Day of Month with Suffixes -St, -Nd, -Rd, -Th)