Calculate Rolling/Moving Average in C++

Calculate rolling / moving average in C++

You simply need a circular array (circular buffer) of 1000 elements, where you add the element to the previous element and store it.

It becomes an increasing sum, where you can always get the sum between any two pairs of elements, and divide by the number of elements between them, to yield the average.

fast way to calculate moving average/rolling function which allows custom weights

As mentioned in comments, a possible solution using stats::filter.

Note following arguments:

sides = 1 so that the filter uses past values only
coefficients order inverted 1/(n:1) because filter calculation starts with most recent value

my.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)]
# Conversion from ts to numeric
my.roll[,test2:=as.numeric(test2)][]

# type    val     test    test2
# <int>  <int>    <num>    <num>
#       1:     1 955625       NA       NA
#       2:     2 979596       NA       NA
#       3:     3 578778       NA       NA
#       4:     4 174631       NA       NA
#       5:     5 459947       NA       NA
# ---                               
#  999996:    96 191233 620505.8 620505.8
#  999997:    97 626522 398615.6 398615.6
#  999998:    98 527846 565061.2 565061.2
#  999999:    99 480277 537305.9 537305.9
# 1000000:   100 757433 395458.3 395458.3

all.equal(my.roll$test,my.roll$test2)
#[1] TRUE

Speed comparison:

microbenchmark::microbenchmark(
  my.roll=my.roll.1(x, n, name, 'type', 'val'),
  filter={my.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)][];
          my.roll[,test2:=as.numeric(test2)][]
          }
  , times=10L
)

Unit: milliseconds
    expr       min        lq       mean     median        uq       max neval
 my.roll 2194.3200 2203.3726 2264.67423 2245.04510 2314.2377 2401.1156    10
  filter   73.1602   76.3098   78.18358   77.25665   80.4204   85.0567    10

How to calculate simple moving average faster in C#?

Your main problem is that you throw away too much information for each iteration.
If you want to run this fast, you need to keep a buffer of the same size as the frame length.

This code will run moving averages for your whole dataset:

(Not real C# but you should get the idea)

decimal buffer[] = new decimal[period];
decimal output[] = new decimal[data.Length];
current_index = 0;
for (int i=0; i<data.Length; i++)
    {
        buffer[current_index] = data[i]/period;
        decimal ma = 0.0;
        for (int j=0;j<period;j++)
            {
                ma += buffer[j];
            }
        output[i] = ma;
        current_index = (current_index + 1) % period;
    }
return output;

Please note that it may be tempting to keep a running cumsum instead of keeping the whole buffer and calculating the value for each iteration, but this does not work for very long data lengths as your cumulative sum will grow so big that adding small additional values will result in rounding errors.

How can I get information about moving average?

If the question is asking to explain the code then it is taking the moving average of length n of the vector x. For example, if there are no NA's and n=2 then the first few elements of the output are (x[1] + x[2])/2, (x[2] + x[3])/2, etc.

n <- 2
x <- c(1, 3, 4, 7, 9)
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))  # 0  1  4  8 15 24
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))  # 0 1 2 3 4 5
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]  #  4  7 11 16
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]  # 2 2 2 2
rsum <- rx / rn  # 2.0 3.5 5.5 8.0

cx is 0 followed by the cumulative sum of x except NA's are replaced with 0
in calculating the cumulative sum.

cn is 0 followed by the cumulative number of non-NA's.

rx is the cumulative sum minus the cumulative sum n positions back.

rn is the number of non-NA's minus the number of non-NAs n positions back.

rsum is the ratio of the last two.

Calculating moving average in C++

The trick is the following: You get updates at random times via void update(int time, float value). However you also need to also track when an update falls off the time window, so you set an "alarm" which called at time + N which removes the previous update from being ever considered again in the computation.

If this happens in real-time you can request the operating system to make a call to a method void drop_off_oldest_update(int time) to be called at time + N

If this is a simulation, you cannot get help from the operating system and you need to do it manually. In a simulation you would call methods with the time supplied as an argument (which does not correlate with real time). However, a reasonable assumption is that the calls are guaranteed to be such that the time arguments are increasing. In this case you need to maintain a sorted list of alarm time values, and for each update and read call you check if the time argument is greater than the head of the alarm list. While it is greater you do the alarm related processing (drop off the oldest update), remove the head and check again until all alarms prior to the given time are processed. Then do the update call.

I have so far assumed it is obvious what you would do for the actual computation, but I will elaborate just in case. I assume you have a method float read (int time) that you use to read the values. The goal is to make this call as efficient as possible. So you do not compute the moving average every time the read method is called. Instead you precompute the value as of the last update or the last alarm, and "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. (i. e. a constant number of operations except for perhaps processing a list of piled up alarms).

Hopefully this is clear -- this should be a quite simple algorithm and quite efficient.

Further optimization: one of the remaining problems is if a large number of updates happen within the time window, then there is a long time for which there are neither reads nor updates, and then a read or update comes along. In this case, the above algorithm will be inefficient in incrementally updating the value for each of the updates that is falling off. This is not necessary because we only care about the last update beyond the time window so if there is a way to efficiently drop off all older updates, it would help.

To do this, we can modify the algorithm to do a binary search of updates to find the most recent update before the time window. If there are relatively few updates that needs to be "dropped" then one can incrementally update the value for each dropped update. But if there are many updates that need to be dropped then one can recompute the value from scratch after dropping off the old updates.

Appendix on Incremental Computation: I should clarify what I mean by incremental computation above in the sentence "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. Initial non-incremental computation:

start with

sum = 0; 
updates_in_window = /* set of all updates within window */; 
prior_update' = /* most recent update prior to window with timestamp tweaked to window beginning */; 
relevant_updates = /* union of prior_update' and updates_in_window */,

then iterate over relevant_updates in order of increasing time:

for each update EXCEPT last { 
    sum += update.value * time_to_next_update; 
},

and finally

moving_average = (sum + last_update * time_since_last_update) / window_length;.

Now if exactly one update falls off the window but no new updates arrive, adjust sum as:

sum -= prior_update'.value * time_to_next_update + first_update_in_last_window.value * time_from_first_update_to_new_window_beginning;

(note it is prior_update' which has its timestamp modified to start of last window beginning). And if exactly one update enters the window but no new updates fall off, adjust sum as:

sum += previously_most_recent_update.value * corresponding_time_to_next_update.

As should be obvious, this is a rough sketch but hopefully it shows how you can maintain the average such that it is O(1) operations per update on an amortized basis. But note further optimization in previous paragraph. Also note stability issues alluded to in an older answer, which means that floating point errors may accumulate over a large number of such incremental operations such that there is a divergence from the result of the full computation that is significant to the application.

calculating moving average for long period

You can smooth without taking an arithmetic average. For example, rather than dumping the specific sample that drops out of your moving window, you can just drop the average itself on every iteration.

newAverage = (oldAverage * (windowSize - 1) + newSample) / windowSize;

It may or may not be good enough for your system, but it's worth a try.

Calculate Rolling/Moving Average in C++