Moving average or running mean
For a short, fast solution that does the whole thing in one loop, without dependencies, the code below works great.
mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []
for i, x in enumerate(mylist, 1):
cumsum.append(cumsum[i-1] + x)
if i>=N:
moving_ave = (cumsum[i] - cumsum[i-N])/N
#can do stuff with moving_ave here
moving_aves.append(moving_ave)
How to calculate rolling / moving average using python + NumPy / SciPy?
A simple way to achieve this is by using np.convolve
.
The idea behind this is to leverage the way the discrete convolution is computed and use it to return a rolling mean. This can be done by convolving with a sequence of np.ones
of a length equal to the sliding window length we want.
In order to do so we could define the following function:
def moving_average(x, w):
return np.convolve(x, np.ones(w), 'valid') / w
This function will be taking the convolution of the sequence x
and a sequence of ones of length w
. Note that the chosen mode
is valid
so that the convolution product is only given for points where the sequences overlap completely.
Some examples:
x = np.array([5,3,8,10,2,1,5,1,0,2])
For a moving average with a window of length 2
we would have:
moving_average(x, 2)
# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])
And for a window of length 4
:
moving_average(x, 4)
# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2. ])
How does convolve
work?
Lets have a more in depth look at the way the discrete convolution is being computed.
The following function aims to replicate the way np.convolve
is computing the output values:
def mov_avg(x, w):
for m in range(len(x)-(w-1)):
yield sum(np.ones(w) * x[m:m+w]) / w
Which, for the same example above would also yield:
list(mov_avg(x, 2))
# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]
So what is being done at each step is to take the inner product between the array of ones and the current window. In this case the multiplication by np.ones(w)
is superfluous given that we are directly taking the sum
of the sequence.
Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of w=4
:
[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5
And the following output would be computed as:
[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75
And so on, returning a moving average of the sequence once all overlaps have been performed.
Daily maximum 8-Hour running mean/moving average
You could just use the filter
function, though I assume you already got your data in a proper format (1D-vector)
hours = 8; % size of hour window defining the moving average
movAV = filter(ones(1,hours)/hours,1,O3_data);
For the daily maximum you need to split your "hour"-vector and movAV
in 24h brackets. Assuming you have one value per hour you could just reshape your result into a 24 x N
array:
%example
x = 1:240; %d ata for 10 days
y = reshape(x,24,[])
then use the additional parameters of the max function to search the max columnwise:
% in this case the max is always the last value of every day
dailyMax = max(y,[],1)
dailyMax =
24 48 72 96 120 144 168 192 216 240
respectively:
dailyMax = max(reshape(movAV,24,[]),[],1)
Probably for your case the most convenient would be to use findpeaks
which would directly output all local maxima (Signal Processing Toolbox required).
DataFrame: Moving average with rolling, mean and shift while ignoring NaN
For example you can adding min_periods
, and NaN
is gone
df=pd.DataFrame({'A':[1,2,3,np.nan,2,3,4,np.nan]})
df.A.rolling(window=2,min_periods=1).mean()
Out[7]:
0 1.0
1 1.5
2 2.5
3 3.0
4 2.0
5 2.5
6 3.5
7 4.0
Name: A, dtype: float64
Efficient averaging (moving average)
Plenty of answers here.. Might as well add another one :)
This one might need some minor debugging for "off by one" etc - I didn't have a real dataset to work with so perhaps treat it as pseudocode
It's like yours: there's a buffer that is circular - give it enough capacity to hold N samples where N is enough to inspect your moving averages - 100 SPS and want to inspect 250ms I think you'll need at least 25, but we aren't short on space so you could make it more
struct Cirray
{
long _head;
TimedFloat[] _data;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public void Add(float f)
{
_data[_head++%_data.Length] = new TimedFloat() { F = f };
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
}
struct TimedFloat
{
[DllImport("Kernel32.dll", CallingConvention = CallingConvention.Winapi)]
private static extern void GetSystemTimePreciseAsFileTime(out long filetime);
private float _f;
public float F { get => _f;
set {
_f = value;
GetSystemTimePreciseAsFileTime(out long x);
T = DateTime.FromFileTimeUtc(x).Ticks;
}
}
public long T;
}
The normal DateTime.UtcNow isn't very precise - about 16ms - so it's probably no good for timestamping data like this if youre saying that even one sample could throw it off. Instead we can make it so we get the ticks equivalent of the high resolution timer, if your system supports it (if not, you might have to change system, or abuse a StopWatch class into giving a higher resolution supplement) and we're timestamping every data item.
I thought about going to the complexity of maintaining N number of constantly moving pointers to various tail ends of the data and dec/incrementing N number of sums - it could still be done (and you clearly know how) but your question read like you'd probably call for the averages infrequently enough that an N sums/counts solution would spend more time maintaining the counts than it would to just run through 250 or 500 floats every now and then and just add them up. GetAverages as a result takes an array of ticks (10 thousand per ms) of the ranges you want the data over, e.g. new[] { 50 * 10000, 100 * 10000, 150 * 10000, 200 * 10000, 250 * 10000 }
for 50ms to 250ms in steps of 50, and it starts at the current head and sums backwards until the point where it's going to break a time boundary (and this might be the off-by-one bit) whereupon it yields the average for that timespan, then resumes summing and counting (the count given by math of the start minus the current index) for the next time span.. I think I understood right that you want e.g. the "average over the last 50ms" and "average over the last 100ms", not "average for the recent 50ms" and "average for the 50ms before recent"
Edit:
Thought about it some more and did this:
struct Cirray
{
long _head;
TimedFloat[] _data;
RunningAverage[] _ravgs;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public Cirray(int capacity, int[] deltas) : this(capacity)
{
_ravgs = new RunningAverage[deltas.Length];
for (int i = 0; i < deltas.Length; i++)
_ravgs[i] = new RunningAverage() { OverMilliseconds = deltas[i] };
}
public void Add(float f)
{
//in c# every assignment returns the assigned value; capture it for use later
var addedTF = (_data[_head++ % _data.Length] = new TimedFloat() { F = f });
if (_ravgs == null)
return;
foreach (var ra in _ravgs)
{
//add the new tf to each RA
ra.Count++;
ra.Total += addedTF.F;
//move the end pointer in the RA circularly up the array, subtracting/uncounting as we go
var boundary = addedTF.T - ra.OverMilliseconds;
while (_data[ra.EndPointer].T < boundary) //while the sample is timed before the boundary, move the
{
ra.Count--;
ra.Total -= _data[ra.EndPointer].F;
ra.EndPointer = (ra.EndPointer + 1) % _data.Length; //circular indexing
}
}
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
public IEnumerable<float> GetAverages() //from the built ins
{
foreach (var ra in _ravgs)
{
if (ra.Count == 0)
yield return 0;
else
yield return (float)(ra.Total / ra.Count);
}
}
}
Absolutely haven't tested it, but it embodies my thinking in the comments
How to calculate a moving average in R
This is a base R solution.
First get the dates as class
date, then store the order of the months, define a range (span_month) and finally move over all windows.
Month_t <- as.Date( paste0(
substr(Month,1,4),"-",substr(Month,5,length(Month)),"-1"),
"%Y-%m-%d" )
Month_ord <- order(Month_t)
span_month=12
sapply( 1:((length(Month_ord) + 1) - span_month),
function(x) mean(emp[Month_ord[x]:Month_ord[x + (span_month - 1)]]) )
# [1] 13.00000 15.00000 17.00000 19.00000 21.00000 23.16667 25.16667 27.16667
# [9] 29.16667 31.16667 33.16667 35.16667 37.16667 39.00000 40.83333 42.66667
#[17] 44.33333 45.83333 47.50000
Related Topics
How to Use Phantomjs in Python
How to Embed Matplotlib in Pyqt - for Dummies
Remove Duplicates by Columns A, Keeping the Row with the Highest Value in Column B
How to Split Text in a Column into Multiple Rows
Threading.Timer - Repeat Function Every 'N' Seconds
Sending "User-Agent" Using Requests Library in Python
How to Add File Extensions Based on File Type on Linux/Unix
Python Subprocess/Popen with a Modified Environment
Python Dictionary: Are Keys() and Values() Always the Same Order
Matplotlib: How to Create Axessubplot Objects, Then Add Them to a Figure Instance
How Does Swapping of Members in Tuples (A,B)=(B,A) Work Internally
Slice 2D Array into Smaller 2D Arrays
How to Ignore the First Line of Data When Processing CSV Data
How to Create a Trie in Python