Calculate Cumulative Average (Mean)

Looking for a way of creating a cumulative average for individuals within a dataset

If it is by 'Cow_ID', then do a group by on that column. Also, make sure to convert the "na" to NA before applying the is.na as well as convert the type as the 'Temp_Diff' column was character class as a result of "na" character element which doesn't work with cummean

library(dplyr)
df %>% 
  na_if("na") %>%
  type.convert(as.is = TRUE) %>% 
  group_by(Cow_ID) %>% 
  mutate(new = replace(Temp_Diff, !is.na(Temp_Diff), 
      cummean(Temp_Diff[!is.na(Temp_Diff)]))) %>%
  ungroup()

-output

# A tibble: 12 × 4
   Cow_ID   Day Temp_Diff    new
    <int> <int>     <dbl>  <dbl>
 1    192     1       0.2  0.2  
 2    192     2       0.1  0.15 
 3    192     3      NA   NA    
 4    192     4       0.8  0.367
 5    201     1       0.4  0.4  
 6    201     2       0.1  0.25 
 7    201     3       0.7  0.4  
 8    201     4      NA   NA    
 9    652     1       0.1  0.1  
10    652     2       0.3  0.2  
11    652     3      NA   NA    
12    652     4       0.8  0.4

Or using data.table, convert the 'data.frame' to 'data.table' (setDT), conver the 'Temp_Diff' to numeric column with as.numeric returns NA for "na" while doing the force conversion, then create the 'new' column by specifying a logical condition in i to subset only the non-NA elements, do the cummean on 'j', grouped by 'Cow_ID' and assign (:=) to a new column 'new'

library(data.table)
 setDT(df)[, Temp_Diff := as.numeric(Temp_Diff)][!is.na(Temp_Diff), 
      new := cummean(Temp_Diff), by = Cow_ID]

Calculate Cumulative Average using Pandas

First, it seems to me that there is a math problem in your calculations also according to your comments.
If in the first row you are taking 3.5 (that is what you call current cumulative) PLUS 4 (the value at 2005 in the first row) and gives --> (3.5+4)/2= 3.75, then there is an error later in row two column 2005.
In fact, there you have (2.25 + 4)/2 = 3.125. What you wrote is 3.175.

Now I believe there are a much better ways to implement my solution, but I get what you need here.

def cumulative_av(x):
    b=[]
    b.append(x[0])
    for i in range(1,len(x)):
        if(x[i]!=0 and b[-1]!=0):
            b.append((x[i]+b[-1])*0.5)
        elif(x[i]!=0 and b[-1]==0):
            b.append(x[i])
        elif(x[i]==0 and b[-1]!=0):
            b.append(b[-1])
        elif(x[i]==0 and b[-1]==0):
            b.append(0)

    return(b)


apd2=pd.DataFrame(columns=["2001", "2002", "2003", "2004", "2005", "2006"])
for i in range(3):
    apd2.loc[i]=cumulative_av(apd.loc[i,].to_list())

Where "apd" is your very initial pandas dataframe. cumulative_av is a function that generates what you have defined (it is a very strange function in my opinion).

Here the results of my code:

    2001    2002    2003    2004    2005    2006
0   2.0     3.5     3.50    3.50    3.750   4.875
1   1.0     2.5     2.25    2.25    3.125   3.125
2   0.0     5.0     5.00    5.00    5.000   3.500

Calculating Cumulative Average every x successive rows in Excel( not to be confused with Average every x rows gap interval)

Use Series.rolling with mean and then Series.shift:

N = 3
df = pd.DataFrame({'Value': [6,9,15,3,27,33]})
df['Cum_sum'] = df['Value'].rolling(N).mean().shift(-N+1)

print (df)
   Value  Cum_sum
0      6     10.0
1      9      9.0
2     15     15.0
3      3     21.0
4     27      NaN
5     33      NaN

Calculating the Cumulative Mean in Python

What about replacing range by np.arange ?

Try:

meancsmdr100 = cumsummdr100 / np.arange(1,51)

How to calculate cumulative mean in R?

Try this

y <- cumsum(x) / seq_along(x)

Reference
https://stat.ethz.ch/pipermail/r-help/2008-May/162729.html

Calculating cumulative mean of recent observations

HAVB's approach is great, but depending on what you want, here is another. This approach is adapted from this answer to a different question, but changed for your circumstances:

library(dplyr)
library(zoo)
# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,30),rep(2,30),rep(3,30),rep(4,30),rep(5,30))
week<-1:30
points<-round(runif(150,1,10),0) 
mydata<- data.frame(player_id=player_id,week=rep(week,5),points)

roll_mean <- function(x, k) {
    result <- rollapplyr(x, k, mean, partial=TRUE, na.rm=TRUE)
    result[is.nan(result)] <- NA
    return( result )
}

mydata<- data.frame(player_id=player_id,week=rep(week,5),points)

mydata<-mydata %>% 
    group_by(player_id) %>%
    arrange(week) %>%
    mutate(rolling_mean = roll_mean(x=lag(points), k=5) ) %>%
    arrange(player_id)

Then we can look at a subset to show it worked:

mydata[mydata$player_id %in% 1:2 & mydata$week %in% 1:6, ]
# A tibble: 12 x 4
# Groups:   player_id [2]
   player_id  week points rolling_mean
       <dbl> <int>  <dbl>        <dbl>
 1         1     1      4           NA
 2         1     2      8     4.000000
 3         1     3      5     6.000000
 4         1     4      9     5.666667
 5         1     5      9     6.500000
 6         1     6      1     7.000000
 7         2     1     10           NA
 8         2     2      9    10.000000
 9         2     3      7     9.500000
10         2     4      8     8.666667
11         2     5      1     8.500000
12         2     6      5     7.000000

So we can see at each time t, rolling_mean for player i will be the mean of the points observations for player i at times {t - 1, ..., min(1, t - 5)}.

Cumulative mean line chart in ggplot

I forgot about one important and obvious step: group by id and calculate the cumsum!

This is what I wanted to achieve:

dat %>%
  group_by(id) %>%
  mutate(cumul = cumsum(event)) %>%
  group_by(female, age) %>%
  summarise(mean(cumul)) %>%
  ggplot(aes(age, `mean(cumul)`, colour = as.factor(female))) +
  geom_line()

Sample Image