Writing a Function to Calculate the Mean of Columns in a Dataframe in R

calculate mean of a column in a data frame when it initially is a character

Try

mean(good$V1, na.rm=TRUE)

or

colMeans(good[sapply(good, is.numeric)], 
na.rm=TRUE)

Using apply function to calculate the mean of a column

Better use sapply on the unique country names. Actually there's no need to split anything.

sapply(unique(strikes.df$country), function(x) 
mean(strikes.df[strikes.df$country == x, "centralization"]))
# Australia Austria Belgium Canada Denmark Finland France
# 0.374644022 0.997670495 0.749485177 0.002244134 0.499958552 0.750374065 0.002729909
# Germany Ireland Italy Japan Netherlands New.Zealand Norway
# 0.249968231 0.499711882 0.250699502 0.124675342 0.749602699 0.375940378 0.875341821
# Sweden Switzerland UK USA
# 0.875253817 0.499990005 0.375946785 0.002390639

But if you depend on using split as well, you may do:

sapply(split(strikes.df$centralization, strikes.df$country), mean)
# Australia Austria Belgium Canada Denmark Finland France
# 0.374644022 0.997670495 0.749485177 0.002244134 0.499958552 0.750374065 0.002729909
# Germany Ireland Italy Japan Netherlands New.Zealand Norway
# 0.249968231 0.499711882 0.250699502 0.124675342 0.749602699 0.375940378 0.875341821
# Sweden Switzerland UK USA
# 0.875253817 0.499990005 0.375946785 0.002390639

Or write it in two lines:

s <- split(strikes.df$centralization, strikes.df$country)
sapply(s, mean)

Edit

If splitting the whole data frame is required, do

s <- split(strikes.df, strikes.df$country)
sapply(s, function(x) mean(x[, "centralization"]))

or

foo <- function(x) mean(x[, "centralization"])
sapply(s, foo)

Calculate mean of data frame by row

When you say df[,1:3] you are choosing all rows of df and columns 1:3. When you apply min or max to that, it simply looks for the min/max among all numbers. It is not doing it by row.

So when yo try to apply the same logic to mean, it again finds the mean value among all numbers in all three columns. Again, not by row.

You need to apply a function to a dimension of df. For this, use apply(df, 1, mean) as PKumar suggested. If you need the mean for each column, you say apply(df, 2, mean). To learn more about apply type ?apply on the R console.

rowMeans and colMeans are shortcuts for apply.

Calculating new column as mean of selected columns in R data frame

Since you wanted rowwise mean, this will work:

dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])


Related Topics



Leave a reply



Submit