Moving Average or Running Mean

Moving average or running mean

For a short, fast solution that does the whole thing in one loop, without dependencies, the code below works great.

mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []

for i, x in enumerate(mylist, 1):
    cumsum.append(cumsum[i-1] + x)
    if i>=N:
        moving_ave = (cumsum[i] - cumsum[i-N])/N
        #can do stuff with moving_ave here
        moving_aves.append(moving_ave)

How to calculate rolling / moving average using python + NumPy / SciPy?

A simple way to achieve this is by using np.convolve.
The idea behind this is to leverage the way the discrete convolution is computed and use it to return a rolling mean. This can be done by convolving with a sequence of np.ones of a length equal to the sliding window length we want.

In order to do so we could define the following function:

def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w

This function will be taking the convolution of the sequence x and a sequence of ones of length w. Note that the chosen mode is valid so that the convolution product is only given for points where the sequences overlap completely.

Some examples:

x = np.array([5,3,8,10,2,1,5,1,0,2])

For a moving average with a window of length 2 we would have:

moving_average(x, 2)
# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])

And for a window of length 4:

moving_average(x, 4)
# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2.  ])

How does `convolve` work?

Lets have a more in depth look at the way the discrete convolution is being computed.
The following function aims to replicate the way np.convolve is computing the output values:

def mov_avg(x, w):
    for m in range(len(x)-(w-1)):
        yield sum(np.ones(w) * x[m:m+w]) / w

Which, for the same example above would also yield:

list(mov_avg(x, 2))
# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]

So what is being done at each step is to take the inner product between the array of ones and the current window. In this case the multiplication by np.ones(w) is superfluous given that we are directly taking the sum of the sequence.

Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of w=4:

[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5

And the following output would be computed as:

  [1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75

And so on, returning a moving average of the sequence once all overlaps have been performed.

Daily maximum 8-Hour running mean/moving average

You could just use the filter function, though I assume you already got your data in a proper format (1D-vector)

hours = 8;   % size of hour window defining the moving average
movAV = filter(ones(1,hours)/hours,1,O3_data);

For the daily maximum you need to split your "hour"-vector and movAV in 24h brackets. Assuming you have one value per hour you could just reshape your result into a 24 x N array:

%example
x = 1:240;   %d ata for 10 days
y = reshape(x,24,[])

then use the additional parameters of the max function to search the max columnwise:

% in this case the max is always the last value of every day
dailyMax = max(y,[],1) 

dailyMax =

    24    48    72    96   120   144   168   192   216   240

respectively:

dailyMax = max(reshape(movAV,24,[]),[],1)

Probably for your case the most convenient would be to use findpeaks which would directly output all local maxima (Signal Processing Toolbox required).

DataFrame: Moving average with rolling, mean and shift while ignoring NaN

For example you can adding min_periods, and NaN is gone

df=pd.DataFrame({'A':[1,2,3,np.nan,2,3,4,np.nan]})
df.A.rolling(window=2,min_periods=1).mean()

Out[7]: 
0    1.0
1    1.5
2    2.5
3    3.0
4    2.0
5    2.5
6    3.5
7    4.0
Name: A, dtype: float64

Efficient averaging (moving average)

Plenty of answers here.. Might as well add another one :)

This one might need some minor debugging for "off by one" etc - I didn't have a real dataset to work with so perhaps treat it as pseudocode

It's like yours: there's a buffer that is circular - give it enough capacity to hold N samples where N is enough to inspect your moving averages - 100 SPS and want to inspect 250ms I think you'll need at least 25, but we aren't short on space so you could make it more

struct Cirray
{
    long _head;
    TimedFloat[] _data;

    public Cirray(int capacity)
    {
        _head = 0;
        _data = new TimedFloat[capacity];
    }

    public void Add(float f)
    {
        _data[_head++%_data.Length] = new TimedFloat() { F = f };
    }

    public IEnumerable<float> GetAverages(int[] forDeltas)
    {
        double sum = 0;
        long start = _head - 1;
        long now = _data[start].T;
        int whichDelta = 0;

        for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
        {
            if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
            {
                yield return (float)(sum / (start - idx));
                whichDelta++;
            }

            sum += _data[idx % _data.Length].F;
        }
    }
}

struct TimedFloat
{
    [DllImport("Kernel32.dll", CallingConvention = CallingConvention.Winapi)]
    private static extern void GetSystemTimePreciseAsFileTime(out long filetime);

    private float _f;
    public float F { get => _f;
        set {
            _f = value;
            GetSystemTimePreciseAsFileTime(out long x);
            T = DateTime.FromFileTimeUtc(x).Ticks;
        }
    }
    public long T;

}

The normal DateTime.UtcNow isn't very precise - about 16ms - so it's probably no good for timestamping data like this if youre saying that even one sample could throw it off. Instead we can make it so we get the ticks equivalent of the high resolution timer, if your system supports it (if not, you might have to change system, or abuse a StopWatch class into giving a higher resolution supplement) and we're timestamping every data item.

I thought about going to the complexity of maintaining N number of constantly moving pointers to various tail ends of the data and dec/incrementing N number of sums - it could still be done (and you clearly know how) but your question read like you'd probably call for the averages infrequently enough that an N sums/counts solution would spend more time maintaining the counts than it would to just run through 250 or 500 floats every now and then and just add them up. GetAverages as a result takes an array of ticks (10 thousand per ms) of the ranges you want the data over, e.g. new[] { 50 * 10000, 100 * 10000, 150 * 10000, 200 * 10000, 250 * 10000 } for 50ms to 250ms in steps of 50, and it starts at the current head and sums backwards until the point where it's going to break a time boundary (and this might be the off-by-one bit) whereupon it yields the average for that timespan, then resumes summing and counting (the count given by math of the start minus the current index) for the next time span.. I think I understood right that you want e.g. the "average over the last 50ms" and "average over the last 100ms", not "average for the recent 50ms" and "average for the 50ms before recent"

Edit:

Thought about it some more and did this:

struct Cirray
{
long _head;
TimedFloat[] _data;
RunningAverage[] _ravgs;

    public Cirray(int capacity)
    {
        _head = 0;
        _data = new TimedFloat[capacity];
    }

    public Cirray(int capacity, int[] deltas) : this(capacity)
    {
        _ravgs = new RunningAverage[deltas.Length];
        for (int i = 0; i < deltas.Length; i++)
            _ravgs[i] = new RunningAverage() { OverMilliseconds = deltas[i] };
    }

    public void Add(float f)
    {
        //in c# every assignment returns the assigned value; capture it for use later
        var addedTF = (_data[_head++ % _data.Length] = new TimedFloat() { F = f });

        if (_ravgs == null)
            return;

        foreach (var ra in _ravgs)
        {
            //add the new tf to each RA
            ra.Count++;
            ra.Total += addedTF.F;

            //move the end pointer in the RA circularly up the array, subtracting/uncounting as we go
            var boundary = addedTF.T - ra.OverMilliseconds; 
            while (_data[ra.EndPointer].T < boundary) //while the sample is timed before the boundary, move the
            {
                ra.Count--; 
                ra.Total -= _data[ra.EndPointer].F;
                ra.EndPointer = (ra.EndPointer + 1) % _data.Length; //circular indexing
            }
        }

    }

    public IEnumerable<float> GetAverages(int[] forDeltas)
    {
        double sum = 0;
        long start = _head - 1;
        long now = _data[start].T;
        int whichDelta = 0;

        for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
        {
            if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
            {
                yield return (float)(sum / (start - idx));
                whichDelta++;
            }

            sum += _data[idx % _data.Length].F;
        }
    }

    public IEnumerable<float> GetAverages() //from the built ins
    {
        foreach (var ra in _ravgs)
        {
            if (ra.Count == 0)
                yield return 0;
            else
                yield return (float)(ra.Total / ra.Count);
        }
    }
}

Absolutely haven't tested it, but it embodies my thinking in the comments

How to calculate a moving average in R

This is a base R solution.

First get the dates as class date, then store the order of the months, define a range (span_month) and finally move over all windows.

Month_t <- as.Date( paste0(
             substr(Month,1,4),"-",substr(Month,5,length(Month)),"-1"),
             "%Y-%m-%d" )

Month_ord <- order(Month_t)

span_month=12

sapply( 1:((length(Month_ord) + 1) - span_month), 
        function(x) mean(emp[Month_ord[x]:Month_ord[x + (span_month - 1)]]) )
# [1] 13.00000 15.00000 17.00000 19.00000 21.00000 23.16667 25.16667 27.16667
# [9] 29.16667 31.16667 33.16667 35.16667 37.16667 39.00000 40.83333 42.66667
#[17] 44.33333 45.83333 47.50000

Moving Average or Running Mean