How to Use a Ring Data Structure in Window Functions

How to use a ring data structure in window functions

  • Use COALESCE like @Justin provided.
  • With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.

    Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

  • When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER w)
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring
ORDER BY part
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
ORDER BY 1,2;

You can even adapt the frame definition for each window function call:

SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER BY 1,2;

Might be faster for rings with many parts. You'll have to test.

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

More about window frame definitions:

  • In the manual.
  • PostgreSQL window function: partition by comparison
  • PostgreSQL query with max and min date plus associated id per row

PostgreSQL window function: partition by comparison

Using several different window functions and two subqueries, this should work decently fast:

WITH events(id, event, ts) AS (
VALUES
(1, 12, '2014-03-19 08:00:00'::timestamp)
,(2, 12, '2014-03-19 08:30:00')
,(3, 13, '2014-03-19 09:00:00')
,(4, 13, '2014-03-19 09:30:00')
,(5, 12, '2014-03-19 10:00:00')
)
SELECT first_value(pre_id) OVER (PARTITION BY grp ORDER BY ts) AS pre_id
, id, ts
, first_value(post_id) OVER (PARTITION BY grp ORDER BY ts DESC) AS post_id
FROM (
SELECT *, count(step) OVER w AS grp
FROM (
SELECT id, ts
, NULLIF(lag(event) OVER w, event) AS step
, lag(id) OVER w AS pre_id
, lead(id) OVER w AS post_id
FROM events
WINDOW w AS (ORDER BY ts)
) sub1
WINDOW w AS (ORDER BY ts)
) sub2
ORDER BY ts;

Using ts as name for the timestamp column.

Assuming ts to be unique - and indexed (a unique constraint does that automatically).

In a test with a real life table with 50k rows it only needed a single index scan. So, should be decently fast even with big tables. In comparison, your query with join / distinct did not finish after a minute (as expected).

Even an optimized version, dealing with one cross join at a time (the left join with hardly a limiting condition is effectively a limited cross join) did not finish after a minute.

For best performance with a big table, tune your memory settings, in particular for work_mem (for big sort operations). Consider setting it (much) higher for your session temporarily if you can spare the RAM. Read more here and here.

How?

  1. In subquery sub1 look at the event from the previous row and only keep that if it has changed, thus marking the first element of a new group. At the same time, get the id of the previous and the next row (pre_id, post_id).

  2. In subquery sub2, count() only counts non-null values. The resulting grp marks peers in blocks of consecutive same events.

  3. In the final SELECT, take the first pre_id and the last post_id per group for each row to arrive at the desired result.

    Actually, this should be even faster in the outer SELECT:

     last_value(post_id) OVER (PARTITION BY grp ORDER BY ts
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND UNBOUNDED FOLLOWING) AS post_id

    ... since the sort order of the window agrees with the window for pre_id, so only a single sort is needed. A quick test seems to confirm it. More about this frame definition.

SQL Fiddle.

Reusing the same moving window for multiple window functions

You can use a WINDOW clause to shorten your code:

SELECT id
, MAX(val) OVER w -
MIN(val) OVER w AS val_range
FROM test_table
WINDOW w AS (ORDER BY id ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);

But that hardly affects performance at all. Postgres will reuse the window frame if you spell it out repeatedly as well. The manual:

When multiple window functions are used, all the window functions
having syntactically equivalent PARTITION BY and ORDER BY clauses
in their window definitions are guaranteed to be evaluated in a single
pass over the data.

Related:

  • How to use a ring data structure in window functions
  • First and last value of window function in one row in PostgreSQL

Send message through Ring (Circular) Buffer between Threads (in C)

Found a solution

int main(int argc, char * argv[])
{
ringBuffer ring;
ringInitialization(&ring, RING_SIZE);
void * packetBuffer = malloc(BUFFER_SIZE * RING_SIZE);
Descriptor temp = { 0 };
uint8_t * currentBuffer = getPointer(packetBuffer, 0);
uint8_t * str = "Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you'd expect to be involved in anything strange or mysterious, because they just didn't hold with such nonsense. Mr.Dursley was the director of a firm called Grunnings, which made drills.He was a big, beefy man with hardly any neck, although he did have a very large mustache.Mrs.Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors.The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.";

strcpy(currentBuffer, str);
temp.dataLen = strlen(str);
temp.data = currentBuffer;

pushBack(&ring, &temp);
ThreadParams params = { &ring, packetBuffer };
HANDLE MessageThread = 0;
MessageThread = CreateThread(NULL, 0, HandleSendThread, ¶ms, 0, NULL);
if (MessageThread == NULL)
{
ExitProcess(MessageThread);
}
WaitForSingleObject(MessageThread, INFINITE);
CloseHandle(MessageThread);
system("pause");
return 0;
}

DWORD WINAPI HandleSendThread(LPVOID params)
{
ringBuffer * ring = ((ThreadParams*)params)->ptr1;
void * buffer = ((ThreadParams*)params)->ptr2;
Descriptor * temp = &ring->bufferData[ring->head];

for (int i = 0; i < temp->dataLen; i++)
{
printf("%c", ((char*)temp->data)[i]);
}
printf("\n");
return 0;
}

Given time/interval to calculate open/high/low/close value in each grouped data

You could use window functions combined with DISTINCT ON for that:

SELECT DISTINCT ON (1)
date_trunc('#{interval}', ticktime) AS ticktime_stamp
, max(bid_price) OVER w AS high
, min(bid_price) OVER w AS low
, sum(bid_volume) OVER w AS volume
, max(product_type) OVER w AS product_type
, min(product_type) OVER w AS product_type
, first_value(bid_price) OVER w AS open
, last_value(bid_price) OVER w AS close
FROM czces
WHERE ticktime >= '#{begin_time}'::timestamp
AND ticktime < '#{end_time}'::timestamp
AND product_type ='#{product_type}'
WINDOW w AS (PARTITION BY date_trunc('#{interval}', ticktime) ORDER BY ticktime
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
ORDER BY 1;

Explanation for the custom window frame:

  • How to use a ring data structure in window functions
  • PostgreSQL window function: partition by comparison

Explanation for DISTINCT ON:

  • Select first row in each GROUP BY group?

TSQL Window Function

That's four columns ;)

You can use the old aggregate-as-crosstab chestnut:

select   Survey,
SurveyAnswer1 = max(iif(SurveyQuestion = 1, SurveyAnswer, null)),
SurveyAnswer2 = max(iif(SurveyQuestion = 2, SurveyAnswer, null))
from YourTable
where SurveyQuestion != 3
group by Survey

Circular buffer in JavaScript

Strange co-incidence, I just wrote one earlier today! I don't know what exactly your requirements are but this might be of use.

It presents an interface like an Array of unlimited length, but ‘forgets’ old items:

// Circular buffer storage. Externally-apparent 'length' increases indefinitely
// while any items with indexes below length-n will be forgotten (undefined
// will be returned if you try to get them, trying to set is an exception).
// n represents the initial length of the array, not a maximum
function CircularBuffer(n) {
this._array= new Array(n);
this.length= 0;
}
CircularBuffer.prototype.toString= function() {
return '[object CircularBuffer('+this._array.length+') length '+this.length+']';
};
CircularBuffer.prototype.get= function(i) {
if (i<0 || i<this.length-this._array.length)
return undefined;
return this._array[i%this._array.length];
};
CircularBuffer.prototype.set= function(i, v) {
if (i<0 || i<this.length-this._array.length)
throw CircularBuffer.IndexError;
while (i>this.length) {
this._array[this.length%this._array.length]= undefined;
this.length++;
}
this._array[i%this._array.length]= v;
if (i==this.length)
this.length++;
};
CircularBuffer.IndexError= {};

find mode in a rolling window of a long sequence of data with duplicates

Space O(M):

  • One ring buffer to hold the M values.
  • One BST holding M {value, PQ pointer} pairs.
  • One Priority Queue holding M Counts.

Update in time O(lg M):

  • Find the departing value in the ring buffer O(1),
  • Find the same value in the BST O(lg M),
  • Adjust the count in the linked PQ node.
    • Do an adjust Priority on that node O(lg M)
  • Replace the old ring buffer entry with the new one O(1)
  • Find the new value in the BST O(lg M),
  • Adjust the count in the linked PQ node.
    • Do an adjust Priority on that node O(lg M)
  • GetFirst on PQ to find mode O(1)

You could get rid of the ring buffer by adding a nextItem pointer to the BST structure, and keeping an external pointer to the oldest item. This speeds it up by one BST lookup, and may be a space win if the value size is larger than the pointer size. But the algorithm becomes more complicated to code.

How do you implement a circular buffer in C?

Can you enumerate the types needed at the time you code up the buffer, or do you need to be able to add types at run time via dynamic calls? If the former, then I would create the buffer as a heap-allocated array of n structs, where each struct consists of two elements: an enum tag identifying the data type, and a union of all the data types. What you lose in terms of extra storage for small elements, you make up in terms of not having to deal with allocation/deallocation and the resulting memory fragmentation. Then you just need to keep track of the start and end indices that define the head and tail elements of the buffer, and make sure to compute mod n when incrementing/decrementing the indices.

How do I implement a circular list (ring buffer) in C?

A very simple implementation, expressed in C. Implements a circular buffer style FIFO queue. Could be made more generic by creating a structure containing the queue size, queue data, and queue indexes (in and out), which would be passed in with the data to add or remove from the queue. These same routines could then handle several queues. Also note that this allows queues of any size, although speedups can be used if you use powers of 2 and customize the code further.

/* Very simple queue
* These are FIFO queues which discard the new data when full.
*
* Queue is empty when in == out.
* If in != out, then
* - items are placed into in before incrementing in
* - items are removed from out before incrementing out
* Queue is full when in == (out-1 + QUEUE_SIZE) % QUEUE_SIZE;
*
* The queue will hold QUEUE_ELEMENTS number of items before the
* calls to QueuePut fail.
*/

/* Queue structure */
#define QUEUE_ELEMENTS 100
#define QUEUE_SIZE (QUEUE_ELEMENTS + 1)
int Queue[QUEUE_SIZE];
int QueueIn, QueueOut;

void QueueInit(void)
{
QueueIn = QueueOut = 0;
}

int QueuePut(int new)
{
if(QueueIn == (( QueueOut - 1 + QUEUE_SIZE) % QUEUE_SIZE))
{
return -1; /* Queue Full*/
}

Queue[QueueIn] = new;

QueueIn = (QueueIn + 1) % QUEUE_SIZE;

return 0; // No errors
}

int QueueGet(int *old)
{
if(QueueIn == QueueOut)
{
return -1; /* Queue Empty - nothing to get*/
}

*old = Queue[QueueOut];

QueueOut = (QueueOut + 1) % QUEUE_SIZE;

return 0; // No errors
}


Related Topics



Leave a reply



Submit