Calculate Number of Concurrent Events in SQL

Calculate number of concurrent events in SQL

1.) Your query did not catch all overlaps - this was fixed by the other answers, already.

2.) The data type of your columns starttime and endtime is timestamp. So your WHERE clause is slightly wrong, too:

BETWEEN '2011-11-02' AND '2011-11-03'

This would include '2011-11-03 00:00'. The upper border has to be excluded.

3.) Removed the mixed case syntax without double-quotes. Unquoted identifiers are cast to lower case automatically. To put it simple: Best don't use mixed case identifiers at all in PostgreSQL.

4.) Transformed the query to use explicit JOIN which is always preferable. Actually, I made it a LEFT [OUTER] JOIN, because I want to count calls that overlap with no other calls, too.

5.) Simplified the syntax a bit to arrive at this base query:

SELECT t1.sid, count(*) AS ct
FROM calls_nov t1
LEFT JOIN calls_nov t2 ON t1.starttime <= t2.endtime
AND t1.endtime >= t2.starttime
WHERE t1.starttime >= '2011-11-02 0:0'::timestamp
AND t1.starttime < '2011-11-03 0:0'::timestamp
GROUP BY 1
ORDER BY 2 DESC;

This query is extremely slow for a big table, because every row starting on '2011-11-02' has to be compared to every row in the whole table, which leads to (almost) O(n²) cost.



Faster

We can drastically cut down the cost by pre-selecting possible candidates. Only select columns and rows you need. I do this with two CTE.

  1. Select calls starting on the day in question. -> CTE x
  2. Calculate the latest end of those calls. (subquery in CTE y)
  3. Select only calls that overlap with the total range of CTE x. -> CTE y
  4. The final query is much faster than querying the huge underlying table.

WITH x AS (
SELECT sid, starttime, endtime
FROM calls_nov
WHERE starttime >= '2011-11-02 0:0'
AND starttime < '2011-11-03 0:0'
), y AS (
SELECT starttime, endtime
FROM calls_nov
WHERE endtime >= '2011-11-02 0:0'
AND starttime <= (SELECT max(endtime) As max_endtime FROM x)
)
SELECT x.sid, count(*) AS count_overlaps
FROM x
LEFT JOIN y ON x.starttime <= y.endtime
AND x.endtime >= y.starttime
GROUP BY 1
ORDER BY 2 DESC;


Faster yet

I have a real life table of 350.000 rows with overlapping start / end timestamps similar to yours. I used that for a quick benchmark. PostgreSQL 8.4, scarce resources because it is a test DB. Indexes on start and end. (Index on ID column is irrelevant here.) Tested with EXPLAIN ANALYZE, best of 5.

Total runtime: 476994.774 ms

CTE variant:

Total runtime: 4199.788 ms -- that's > factor 100.

After adding a multicolumn index of the form:

CREATE INDEX start_end_index on calls_nov (starttime, endtime);

Total runtime: 4159.367 ms



Ultimate Speed

If that is not enough, there is a way to speed it up yet another order of magnitude. Instead of the CTEs above, materialize the temp tables and - this is the crucial point - create an index on the second one. Could look like this:

Execute as one transaction:

CREATE TEMP TABLE x ON COMMIT DROP AS   
SELECT sid, starttime, endtime
FROM calls_nov
WHERE starttime >= '2011-11-02 0:0'
AND starttime < '2011-11-03 0:0';

CREATE TEMP TABLE y ON COMMIT DROP AS
SELECT starttime, endtime
FROM calls_nov
WHERE endtime >= '2011-11-02 0:0'
AND starttime <= (SELECT max(endtime) FROM x);

CREATE INDEX y_idx ON y (starttime, endtime); -- this is where the magic happens

SELECT x.sid, count(*) AS ct
FROM x
LEFT JOIN y ON x.starttime <= y.endtime
AND x.endtime >= y.starttime
GROUP BY 1
ORDER BY 2 DESC;

Read about temporary tables in the manual.



Ultimate solution

  • Create a plpgsql function that encapsulates the magic.

  • Diagnose the typical size of your temp tables. Create them standalone and measure:

      SELECT pg_size_pretty(pg_total_relation_size('tmp_tbl'));
  • If they are bigger than your setting for temp_buffers then temporarily set them high enough in your function to hold both your temporary tables in RAM. It is a major speedup if you don't have to swap to disc. (Must be first use of temp tables in session to have effect.)

CREATE OR REPLACE FUNCTION f_call_overlaps(date)
RETURNS TABLE (sid varchar, ct integer) AS
$BODY$
DECLARE
_from timestamp := $1::timestamp;
_to timestamp := ($1 +1)::timestamp;
BEGIN

SET temp_buffers = 64MB'; -- example value; more RAM for temp tables;

CREATE TEMP TABLE x ON COMMIT DROP AS
SELECT c.sid, starttime, endtime -- avoid naming conflict with OUT param
FROM calls_nov c
WHERE starttime >= _from
AND starttime < _to;

CREATE TEMP TABLE y ON COMMIT DROP AS
SELECT starttime, endtime
FROM calls_nov
WHERE endtime >= _from
AND starttime <= (SELECT max(endtime) FROM x);

CREATE INDEX y_idx ON y (starttime, endtime);

RETURN QUERY
SELECT x.sid, count(*)::int -- AS ct
FROM x
LEFT JOIN y ON x.starttime <= y.endtime AND x.endtime >= y.starttime
GROUP BY 1
ORDER BY 2 DESC;

END;
$BODY$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_call_overlaps('2011-11-02') -- just name your date

Total runtime: 138.169 ms -- that's factor 3000



What else can you do to speed it up?

General performance optimization.

CLUSTER calls_nov USING starttime_index; -- this also vacuums the table fully

ANALYZE calls_nov;

How to get maximum number of concurrent events in postgresql?

Here is the idea: count the number of starts and subtract the number of stops. That gives the net amount at each time. The rest is just aggregation:

with e as (
select start_datetime as dte, 1 as inc
from events
union all
select end_datetime as dte, -1 as inc
from events
)
select max(concurrent)
from (select dte, sum(sum(inc)) over (order by dte) as concurrent
from e
group by dte
) e;

The subquery shows the number of overlapping events at each time.

You can get the time frame as:

select dte, next_dte, concurrent
from (select dte, sum(sum(inc)) over (order by dte) as concurrent,
lead(dte) over (partition by dte) as next_dte
from e
group by dte
) e
order by concurrent desc
fetch first 1 row only;

Oracle SQL - Calculating Number of Concurrent Events

You can calculate the number of concurrent events by using a relatively simple technique: cumulative aggregation. The idea is to count the number of starts and stops. Then the cumulative number is the number of concurrent values.

select tm, sum(isstart) as numstarts, sum(isstop) as numstops,
(sum(sum(isstart)) over (order by tm nulls last) -
sum(sum(isstop)) over (order by tm nulls last)
) as NumConcurrent
from ((select start_tm as tm, 1 as isstart, 0 as isstop from events
) union all
(select stop_tm, 0 as isstart, 1 as isstop from events
)
) e
group by tm;

This gives you the number of concurrent events for each time in the data (either a start or end time. You can then extract the maximum value for a day or hour using a where clause and order by/fetch first or aggregation.

How can I check for average concurrent events in a SQL table based on the date, time and duration of the events?

I think MarkusQ has the answer, but let me develop an alternative that you may find easier to use. I'll use my customary method of developing this as a series of simple transformations in views, an analogue of functional decomposition in a procedural language.

First, let's put everything in common units. Recall that record's column s is seconds since the epoch, midnight 1 January 1970. We can find the number of seconds since midnight of the day of the call, that call occurred, by just taking s modulus the number of seconds in a day: s % (60 * 60 * 24).

select *, 
s % (60 * 60 * 24) as start_secs_from_midnight,
s % (60 * 60 * 24) + dur - 1 as end_secs_from_midnight,
;

We subtract one from s + dur because a one second call that starts at 12:00:00 also ends on 12:00:00.

We can find minutes since midnight by dividing those results by 60, or just by floor( s / 60 ) % (60 * 24) :

create view record_mins_from_midnight as
select *,
floor( s / 60 ) % (60 * 24) as start_mins_fm,
floor( ( s + dur - 1) / 60 ) % (60 * 24) as end_mins_fm
from record
;

Now we create a table of minutes. We need 1440 of them, numbered from 0 to 1439. In databases that don't support arbitrary sequences, I create an artificial range or sequence like this:

  create table artificial_range ( 
id int not null primary key auto_increment, idz int) ;
insert into artificial_range(idz) values (0);
-- repeat next line to double rows
insert into artificial_range(idz) select idz from artificial_range;

So to create a minute table:

  create view minute as 
select id - 1 as active_minute
from artificial_range
where id <= 1440
;

Now we just join minute to our record view

create view record_active_minutes as
select * from minutes a
join record_mins_from_midnight b
on (a.active_minute >= b.start_mins_fm
and a.active_minute <= b.end_mins_fm
;

This just cross products/multiplies record rows, so we have one record row for each whole minute over which the call was active.

Note that I'm doing this by defining active as "(part of) the call occurred during a minute". That is, a two second call that starts at 12:00:59 and ends at 12:01:01 by this definition occurs during two different minutes, but a two second call that starts at 12:00:58 and ends at 12:00:59 occurs during one minute.

I did that because you specified "So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02". If you prefer to consider only calls lasting more than sixty seconds to occur in more than one minute, you'll need to adjust the join.

Now if we want to find the number of active records for any granularity equal to or larger than minute granularity, we just group on that last view. To find average calls per hour we divide by 60 to turn minutes to hours:

 select floor( active_minute / 60 ) as hour, 
count(*) / 60 as avg_concurent_calls_per_minute_for_hour
from record_active_minutes
group by floor( active_minute / 60 ) ;

Note that that is the average per hour for all calls, over all days; if we want to limit it to a particular day or range of days, we'd add a where clause.


But wait, there's more!

If we create a version of record_active_minutes that does a left outer join, we can get a report that shows the average over all hours in the day:

 create view record_active_minutes_all as
select *
from
minutes a
left outer join record_mins_from_midnight b
on (a.active_minute >= b.start_mins_fm
and a.active_minute <= b.end_mins_fm)
;

Then we again do our select, but against the new view:

 select floor( active_minute / 60 ) as hour, 
count(*) / 60 as avg_concurent_calls_per_min
from record_active_minutes_all
group by floor( active_minute / 60 ) ;

+------+------------------------------+
| hour | avg_concurrent_calls_per_min |
+------+------------------------------+
| 0 | 0.0000 |
| 1 | 0.0000 |
| 2 | 0.0000 |
| 3 | 0.0000 |
etc....

We can also index into this with a where. Unfortunately, the join means we'll have null values for the underlying record table where no calls exist for a particular hour, e.g.,

 select floor( active_minute / 60 ) as hour, 
count(*) / 60 as avg_concurent_calls_per_min
from record_active_minutes_all
where month(date) = 1 and year(date) = 2008
group by floor( active_minute / 60 ) ;

will bring back no rows for hours in which no calls occurred. If we still want our "report-like" view that shows all hours, we make sure we also include those hours with no records:

 select floor( active_minute / 60 ) as hour, 
count(*) / 60 as avg_concurent_calls_per_minute_for_hour
from record_active_minutes_all
where (month(date) = 1 and year(date) = 2008)
or date is null
group by floor( active_minute / 60 ) ;

Note that in the last two examples, I'm using a SQL date (to which the functions month and year can be applied), not the char(4) date in your record table.

Which brings up another point: both the date and time in your record table are superfluous and denormalized, as each can be derived from your column s. Leaving them in the table allows the possibility of inconsistent rows, in which date(s) <> date or time(s) <> time. I'd prefer to do it like this:

   create table record ( id int not null primary key, s, duration) ; 

create view record_date as
select *, dateadd( ss, s, '1970-01-01') as call_date
from record
;

In the dateadd function, the ss is an enumerated type that tells the function to add seconds; s is the column in record.

MySQL - Get the total amount of concurrent events for a given date range

The Tip from "Strawberry" helped me find a solution what I ended up with looks like this:

SET @d1 = '2014-12-01';
SET @d2 = '2014-12-07';

SELECT x.selected_date, COUNT(*) total
FROM (
select * from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) selected_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where selected_date between @d1 and @d2
) x
JOIN dataset y
ON x.selected_date BETWEEN y.valid_from AND y.valid_to
GROUP
BY selected_date
ORDER
BY total DESC
LIMIT 1;

It's basically his solution but without the need for a additional calendar table. To get a list of dates between two given dates I used this solution: https://stackoverflow.com/a/13814885/3342150

Thanks

Oracle SQL - Efficiently calculate number of concurrent phone calls

You can use an UNPIVOT (using a similar technique to my answer here):

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE table_name ( END, LINE, CALLDURATION ) AS
SELECT CAST( TIMESTAMP '2012-01-25 14:05:10' AS DATE ), 6, 65 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:08:51' AS DATE ), 7, 1142 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:20:36' AS DATE ), 5, 860 FROM DUAL;

Query 1:

SELECT p.*,
SUM( status ) OVER ( ORDER BY dt, status DESC ) AS currentlyusedlines
FROM (
SELECT end - callduration / 86400 As dt,
t.*
FROM table_name t
)
UNPIVOT( dt FOR status IN ( dt As 1, end AS -1 ) ) p

Results:

| LINE | CALLDURATION | STATUS |                   DT | CURRENTLYUSEDLINES |
|------|--------------|--------|----------------------|--------------------|
| 7 | 1142 | 1 | 2012-01-25T13:49:49Z | 1 |
| 6 | 65 | 1 | 2012-01-25T14:04:05Z | 2 |
| 6 | 65 | -1 | 2012-01-25T14:05:10Z | 1 |
| 5 | 860 | 1 | 2012-01-25T14:06:16Z | 2 |
| 7 | 1142 | -1 | 2012-01-25T14:08:51Z | 1 |
| 5 | 860 | -1 | 2012-01-25T14:20:36Z | 0 |

Oracle SQL - Calculate number of concurrent phone calls

One solution would be this one:

WITH t AS
(SELECT TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE AS TS
FROM dual
CONNECT BY TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE <= TIMESTAMP '2012-01-25 14:15:00'),
calls AS
(SELECT TIMESTAMP '2012-01-25 14:05:10' AS END_TIME, 6 AS LINE, 65 AS duration FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:08:51', 7, 1142 FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:20:36', 5, 860 FROM dual)
SELECT TS, count(distinct line)
FROM t
LEFT OUTER JOIN calls ON ts BETWEEN END_TIME - duration * INTERVAL '1' SECOND AND END_TIME
GROUP BY ts
HAVING count(distinct line) > 0
ORDER BY ts;

TS COUNT(DISTINCTLINE)
-------------------- -------------------
25.01.2012 14:05:00 2
25.01.2012 14:10:00 1
25.01.2012 14:15:00 1

3 rows selected.

Start and end times - how many concurrent events per hour/day/week etc

I think the easiest way would be to use a database or a standalone sort program or subroutine to pick out the start and stop events in ascending order of time and then process them with a simple program or routine.

As you read in events keep a running counter of open events - add one when you see a start and subtract one when you see a sort.

Then every time you pick up a new event you can work out how long it was since the last event, and you know how many events were open during this time. So if this was e.g. 5 minutes with two events open then you have seen 10 event-minutes of concurrent events.

If you total these event-minutes up and divide by the length of time from the first event to the last event you will have a measure of the average number of concurrent events - if you pick a random instant between the first and last event then the average number of concurrent events going on during that random instant will be this measure of average number of concurrent events.



Related Topics



Leave a reply



Submit