How to Calculate an Exponential Moving Average on Postgres

How to calculate an exponential moving average on postgres?

You can define your own aggregate function and then use it with a window specification to get the aggregate output at each stage rather than a single value.

So an aggregate is a piece of state, and a transform function to modify that state for each row, and optionally a finalising function to convert the state to an output value. For a simple case like this, just a transform function should be sufficient.

create function ema_func(numeric, numeric) returns numeric
language plpgsql as $$
declare
alpha numeric := 0.5;
begin
-- uncomment the following line to see what the parameters mean
-- raise info 'ema_func: % %', $1, $2;
return case
when $1 is null then $2
else alpha * $2 + (1 - alpha) * $1
end;
end
$$;
create aggregate ema(basetype = numeric, sfunc = ema_func, stype = numeric);

which gives me:

steve@steve@[local] =# select x, ema(x, 0.1) over(w), ema(x, 0.2) over(w) from data window w as (order by n asc) limit 5;
x | ema | ema
-----------+---------------+---------------
44.988564 | 44.988564 | 44.988564
39.5634 | 44.4460476 | 43.9035312
38.605724 | 43.86201524 | 42.84396976
38.209646 | 43.296778316 | 41.917105008
44.541264 | 43.4212268844 | 42.4419368064

These numbers seem to match up to the spreadsheet you added to the question.

Also, you can define the function to pass alpha as a parameter from the statement:

create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language plpgsql as $$
begin
return case
when state is null then inval
else alpha * inval + (1-alpha) * state
end;
end
$$;

create aggregate ema(numeric, numeric) (sfunc = ema_func, stype = numeric);

select x, ema(x, 0.5 /* alpha */) over (order by n asc) from data

Also, this function is actually so simple that it doesn't need to be in plpgsql at all, but can be just a sql function, although you can't refer to parameters by name in one of those:

create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language sql as $$
select case
when $1 is null then $2
else $3 * $2 + (1-$3) * $1
end
$$;

Quickest way to return snapshots of Exponential Moving Average in postgresql

First, a few notes about minor efficiency issues with your function:

  • You do not have to quote_literal() anything other than strings. It is impossible that Bobby Tables is injected into your SQL statement through a double precision or timestamp parameter.
  • In a dynamic SQL statement you only have to manually splice in table and column names; parameter values can be injected using the USING clause. This saves a lot of parsing time.
  • Move as many calculations as possible outside of your loop. For instance:
DECLARE
...
dur_int interval := duration * interval '1 second';
localStart timestamptz := timestart;
localEnd timestamptz := localStart + dur_int;
BEGIN
...
FOR i in 1 .. groups LOOP
...
localStart := localStart + dur_int;
localEnd := localEnd + dur_int;
END LOOP;
...

But this is really all moot...

In your code you first populate a temp table with 7733 rows of data, from which you later extract one record at a time with a dynamic query in a loop that runs 390 times. All very, very wasteful. You can replace the entire function body by just a single statement:

RETURN QUERY EXECUTE format('SELECT datetime, ema '
'FROM ('
'SELECT datetime, ema, '
'rank() OVER (PARTITION BY bucket ORDER BY datetime DESC) AS rank '
'FROM ('
'SELECT datetime, ema(value, $1) OVER (ORDER BY datetime ASC) AS ema, '
'ceil(extract(epoch from (datetime - $2)) / $3) AS bucket '
'FROM %I '
'WHERE symbol = $4 '
'AND datetime BETWEEN $2 AND $5) x '
'WHERE rank = 1) y '
'ORDER BY 1', ptable) USING alpha, timestart, duration, psymbol, timeend;

The principle here is that in the innermost query you calculate the "bucket" into which each processed row from the table would fall. In the next level query you calculate the rank of all rows in each bucket based on the datetime. In the main query you then pick the most recent row from each bucket, i.e. the one where rank = 1.

On speed: you should really do an EXPLAIN on all queries on the server, rather than measure on the client which includes network transfer time.

How to Calc Exponential Moving Average using SQL Server 2012 Window Functions

The formula for EMA(x) is:

EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)

With

β := 1 - α
that is equivalent to


EMA(xn) = βn-1 * x1 + α * βn-2 * x2 + α * βn-3 * x3 + ... + α * xn

In that form it is easy to implement with LAG. For a 4 row EMA it would look like this:

SELECT LAG(x,3)OVER(ORDER BY ?) * POWER(@beta,3) + 
LAG(x,2)OVER(ORDER BY ?) * POWER(@beta,2) * @alpha +
LAG(x,1)OVER(ORDER BY ?) * POWER(@beta,1) * @alpha +
x * @alpha
FROM ...

OK, as you seem to be after the EWMA_Chart I created a SQL Fiddle showing how to get there. However, be aware that it is using a recursive CTE that requires one recursion per row returned. So on a big dataset you will most likely get disastrous performance. The recursion is necessary as each row depends on all rows that happened before. While you could get all preceding rows with LAG() you cannot also reference preceding calculations as LAG() cannot reference itself.

Also, the formular in the spreadsheet you attached below does not make sense. It seems to be trying to calculate the EWMA_Chart value but it is failing at that. In the above SQLFiddle I included a column [Wrong] that calculates the same value that the spreadsheet is calculating.

Either way, if you need to use this on a big dataset, you are probably better of writing a cursor.

This is the code that does the calculation in above SQLFiddle. it references th vSMA view that calculates the 10 row moving average.

WITH

smooth AS(
SELECT CAST(0.1818 AS NUMERIC(20,5)) AS alpha
),

numbered AS(
SELECT Date, Price, SMA, ROW_NUMBER()OVER(ORDER BY Date) Rn
FROM vSMA
WHERE SMA IS NOT NULL
),

EWMA AS(
SELECT Date, Price, SMA, CAST(SMA AS NUMERIC(20,5)) AS EWMA, Rn
, CAST(SMA AS NUMERIC(20,5)) AS Wrong
FROM numbered
WHERE Rn = 1
UNION ALL
SELECT numbered.Date, numbered.Price, numbered.SMA,
CAST(EWMA.EWMA * smooth.alpha + CAST(numbered.SMA AS NUMERIC(20,5)) * (1 - smooth.alpha) AS NUMERIC(20,5)),
numbered.Rn
, CAST((numbered.Price - EWMA.EWMA) * smooth.alpha + EWMA.EWMA AS NUMERIC(20,5))
FROM EWMA
JOIN numbered
ON EWMA.rn + 1 = numbered.rn
CROSS JOIN smooth
)
SELECT Date, Price, SMA, EWMA
, Wrong
FROM EWMA

ORDER BY Date;

Moving Average based on Timestamps in PostgreSQL

Assuming you want to restart the rolling average after each 15 minute interval:

select id, 
temp,
avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (
select id,
temp,
time_read,
interval_group,
id - row_number() over (partition by interval_group order by time_read) as group_nr
from (
select id,
time_read,
'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
temp
from readings
) t1
) t2
order by time_read;

It is based on Depesz's solution to group by "time ranges":

Here is an SQLFiddle example: http://sqlfiddle.com/#!1/0f3f0/2

Moving average in postgresql

SQL Fiddle

select
"date",
shop_id,
amount,
extract(dow from date),
case when
row_number() over (order by date) > 3
then
avg(amount) OVER (
ORDER BY date DESC
ROWS BETWEEN 1 following AND 3 FOLLOWING
)
else null end
from (
select *
from ro
where extract(dow from date) = 4
) s

What is wrong with the OP's query is the frame specification:

ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING

Other than that my query avoids unneeded computing by filtering Thursdays before applying the expensive window functions.

If it is necessary to partition by shop_id then obviously add the partition by shop_id to both functions, avg and row_number.

Moving averages with MongoDB's aggregation framework?

The agg framework now has $map and $reduce and $range built in so array processing is much more straightfoward. Below is an example of calculating moving average on a set of data where you wish to filter by some predicate. The basic setup is each doc contains filterable criteria and a value, e.g.

{sym: "A", d: ISODate("2018-01-01"), val: 10}
{sym: "A", d: ISODate("2018-01-02"), val: 30}

Here it is:

// This controls the number of observations in the moving average:
days = 4;

c=db.foo.aggregate([

// Filter down to what you want. This can be anything or nothing at all.
{$match: {"sym": "S1"}}

// Ensure dates are going earliest to latest:
,{$sort: {d:1}}

// Turn docs into a single doc with a big vector of observations, e.g.
// {sym: "A", d: d1, val: 10}
// {sym: "A", d: d2, val: 11}
// {sym: "A", d: d3, val: 13}
// becomes
// {_id: "A", prx: [ {v:10,d:d1}, {v:11,d:d2}, {v:13,d:d3} ] }
//
// This will set us up to take advantage of array processing functions!
,{$group: {_id: "$sym", prx: {$push: {v:"$val",d:"$date"}} }}

// Nice additional info. Note use of dot notation on array to get
// just scalar date at elem 0, not the object {v:val,d:date}:
,{$addFields: {numDays: days, startDate: {$arrayElemAt: [ "$prx.d", 0 ]}} }

// The Juice! Assume we have a variable "days" which is the desired number
// of days of moving average.
// The complex expression below does this in python pseudocode:
//
// for z in range(0, size of value vector - # of days in moving avg):
// seg = vector[n:n+days]
// values = seg.v
// dates = seg.d
// for v in seg:
// tot += v
// avg = tot/len(seg)
//
// Note that it is possible to overrun the segment at the end of the "walk"
// along the vector, i.e. not enough date-values. So we only run the
// vector to (len(vector) - (days-1).
// Also, for extra info, we also add the number of days *actually* used in the
// calculation AND the as-of date which is the tail date of the segment!
//
// Again we take advantage of dot notation to turn the vector of
// object {v:val, d:date} into two vectors of simple scalars [v1,v2,...]
// and [d1,d2,...] with $prx.v and $prx.d
//
,{$addFields: {"prx": {$map: {
input: {$range:[0,{$subtract:[{$size:"$prx"}, (days-1)]}]} ,
as: "z",
in: {
avg: {$avg: {$slice: [ "$prx.v", "$$z", days ] } },
d: {$arrayElemAt: [ "$prx.d", {$add: ["$$z", (days-1)] } ]}
}
}}
}}

]);

This might produce the following output:

{
"_id" : "S1",
"prx" : [
{
"avg" : 11.738793632512115,
"d" : ISODate("2018-09-05T16:10:30.259Z")
},
{
"avg" : 12.420766702631376,
"d" : ISODate("2018-09-06T16:10:30.259Z")
},
...

],
"numDays" : 4,
"startDate" : ISODate("2018-09-02T16:10:30.259Z")
}


Related Topics



Leave a reply



Submit