How to Group Timestamps into Islands (Based on Arbitrary Gap)

How to group timestamps into islands (based on arbitrary gap)?

This would do it:

SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS grp
FROM  (
   SELECT done
       , (lag(done) OVER (ORDER BY done) <= done - interval '2 min') AS step
   FROM   tbl
   ) sub
ORDER  BY done;

The subquery sub records step as true if the previous row is at least 2 min away - sorted by the timestamp column done itself in this case.

The outer query adds a rolling count of steps, effectively the group number (grp) - combining the aggregate FILTER clause with another window function.

db<>fiddle here

Query to find all timestamps more than a certain interval apart
How to label groups in postgresql when group belonging depends on the preceding line?
Select longest continuous sequence
Grouping or Window

About the aggregate FILTER clause:

How can I simplify this game statistics query?
Conditional lead/lag function PostgreSQL?

How to form groups of consecutive dates allowing for a given maximum gap?

Counting the gaps (greater than given tolerance) in a second window function forms the group numbers you are after:

SELECT person_id, contact_day
     , count(*) FILTER (WHERE gap > 3) OVER (PARTITION BY person_id ORDER BY contact_day) AS dash_group
FROM  (
   SELECT person_id, contact_day
        , contact_day - lag(contact_day) OVER (PARTITION BY person_id ORDER BY contact_day) AS gap
   FROM   mydata
   ) sub
ORDER  BY person_id, contact_day;  -- optional

db<>fiddle here

About the aggregate FILTER clause:

Aggregate columns with additional (distinct) filters

It's short and intuitive, and typically fastest. See:

For absolute performance, is SUM faster or COUNT?

It's the classic topic of "gaps and islands". Once you know to look for the tag gaps-and-islands, you'll find plenty of related or nigh identical questions and answers like:

Select longest continuous sequence
How to group timestamps into islands (based on arbitrary gap)?
How to label groups in postgresql when group belonging depends on the preceding line?

Etc.

I tagged accordingly now.

How to merge intervals based on a minimum gap of dates?

This work in Postgres 14. Not tested for Redshift.

SELECT ptid, visit_start_date, visit_end_date, admin_startdate, admin_enddate
     , min(admin_startdate) OVER (PARTITION BY visit_id, admin) AS episode_startdate
     , max(admin_enddate)   OVER (PARTITION BY visit_id, admin) AS episode_enddate
FROM (
   SELECT *, count(*) FILTER (WHERE gap) OVER (PARTITION BY visit_id ORDER BY admin_startdate) AS admin
   FROM (
      SELECT *, admin_startdate - lag(admin_enddate) OVER (PARTITION BY visit_id ORDER BY admin_startdate) > 2 AS gap
      FROM  (
         SELECT *, dense_rank() OVER (ORDER BY ptid, visit_start_date, visit_end_date) AS visit_id  -- optional, to simplify
         FROM   tbl
         ) sub1
      ) sub2
   ) sub3

db<>fiddle here

The innermost subquery sub1 is only to compute a unique visit_id - which should really be in your table instead of repeating (ptid, visit_start_date, visit_end_date ) over and over. Consider normalizing your design at least that much.

The next subquery sub2 checks for a gap that's greater than two days to the previous row in the same partition.

Subquery sub3 then counts those gaps to identify distinct administration periods (admin)

In the outer SELECT, min(admin_startdate) and max(admin_enddate) per administration period produce the desired episode dates.

See (with assorted links to more):

How to group timestamps into islands (based on arbitrary gap)?

Query to find all timestamps more than a certain interval apart

To start a new session after every gap >= 1 hour:

SELECT user_id, count(*) AS distinct_sessions
FROM (
   SELECT user_id
        ,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
                                                  ORDER BY request_time)
           <= request_time - '1h'::interval) AS step -- start new session
   FROM   tbl
   ) sub
WHERE  step
GROUP  BY user_id
ORDER  BY user_id;

Assuming request_time NOT NULL.

Explain:

In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.
In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.

Alternative interpretation

If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:

SELECT user_id
     , count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Forming groups of spatio-temporally near trajectories in R or PostgreSQL

UPDATE: [after understanding the real question ;-] Finding the equivalence groups of bikes (set, bike_set) is in fact a relational-division problem. Finding the begin and end of the segments (clust) within a set of bikes is basically the same as in the first attempt.

The clusters are stored in arrays: (I trust on the clusters not becoming too large)
The array is built by a recursive query: every pair of bikes that has one member in common with the current cluster is merged into it.
At the end, the arrays contain all bike_id's that happened to be within reach at the particular time.
(plus some intermediate rows which need to be suppressed later by the uniq CTE)
The rest is more-or-less standard gap-detection in time-series.

NOTE: the code trusts on (bike2 > bike1). This is needed to keep the array sorted and thus canonical. The actual content is not guaranteed to be canonical because the order of addition in the recursive query cannot be guaranteed. This may need some extra work.

CREATE TABLE nearness
        ( bike1 INTEGER NOT NULL
        , bike2 INTEGER NOT NULL
        , stamp timestamp NOT NULL
        , near boolean
        , PRIMARY KEY(bike1,bike2,stamp)
        );
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
 (1,2, '2016-05-28 11:00:00', TRUE)
,(1,2, '2016-05-28 11:00:05', TRUE)
,(1,2, '2016-05-28 11:00:10', TRUE)
,(1,2, '2016-05-28 11:00:20', TRUE) -- <<-- gap here
,(1,2, '2016-05-28 11:00:25', TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE)    -- <<-- these False-records serve no pupose
,(4,5, '2016-05-28 11:00:00', FALSE)    -- <<-- result would be the same without them
,(4,5, '2016-05-28 11:00:05', FALSE)
,(4,5, '2016-05-28 11:00:10', TRUE)
,(4,5, '2016-05-28 11:00:15', TRUE)
,(4,5, '2016-05-28 11:00:20', TRUE)
,(2,3, '2016-05-28 11:00:05', TRUE) -- <<-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10', TRUE) -- <<-- no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016-05-28 11:00:05', FALSE)
        ;

        -- Recursive union-find to glue together sets of bike_ids
        -- ,occuring at the same moment.
        -- Sets are represented as {ordered,unique} arrays here
WITH RECURSIVE wood AS (
        WITH omg AS (
                SELECT bike1 ,bike2,stamp
                , row_number() OVER(ORDER BY bike1,bike2,stamp) AS seq
                , ARRAY[bike1,bike2]::integer[] AS arr
                FROM nearness n WHERE near = True
                )
        -- Find all existing combinations of bikes
        SELECT o1.stamp, o1.seq
                , ARRAY[o1.bike1,o1.bike2]::integer[] AS arr
        FROM omg o1
        UNION ALL
        SELECT o2.stamp, o2.seq -- avoid duplicates inside the array
                , CASE when o2.bike1 = ANY(w.arr) THEN w.arr || o2.bike2
                ELSE  w.arr || o2.bike1 END AS arr
        FROM omg o2
        JOIN wood w
                ON o2.stamp = w.stamp AND o2.seq > w.seq
                AND (o2.bike1 = ANY(w.arr) OR o2.bike2 = ANY(w.arr))
                AND NOT (o2.bike1 = ANY(w.arr) AND o2.bike2 = ANY(w.arr))
        )
, uniq  AS (    -- suppress partial sets caused by the recursive union-find buildup
        SELECT * FROM wood w
        WHERE NOT EXISTS (SELECT * FROM wood nx
                WHERE nx.stamp = w.stamp
                AND nx.arr @> w.arr AND nx.arr <> w.arr -- contains but not equal 
                )
        )
, xsets AS (    -- make unique sets of bikes
        SELECT DISTINCT arr
        -- , MIN(seq) AS grp
        FROM uniq
        GROUP BY arr
        )
, sets AS (     -- enumerate the sets of bikes
        SELECT arr
        , row_number() OVER () AS setnum
        FROM xsets
        )
, drag AS (             -- Detect beginning and end of segments of consecutive observations
        SELECT u.*      -- within a constant set of bike_ids
        -- Edge-detection begin of group
        , NOT EXISTS (SELECT * FROM uniq nx
                WHERE nx.arr = u.arr
                AND nx.stamp < u.stamp
                AND nx.stamp >= u.stamp - '5 sec'::interval
                ) AS is_first
        -- Edge-detection end of group
        , NOT EXISTS (SELECT * FROM uniq nx
                WHERE nx.arr = u.arr
                AND nx.stamp > u.stamp
                AND nx.stamp <= u.stamp + '5 sec'::interval
                ) AS is_last
        , row_number() OVER(ORDER BY arr,stamp) AS nseq
        FROM uniq u
        )
, top AS ( -- id and groupnum for the start of a group
        SELECT nseq
        , row_number() OVER () AS clust
        FROM drag
        WHERE is_first
        )
, bot AS ( -- id and groupnum for the end of a group
        SELECT nseq
        , row_number() OVER () AS clust
        FROM drag
        WHERE is_last
        )
SELECT w.seq as orgseq  -- results, please ...
        , w.stamp
        , g0.clust AS clust
        , row_number() OVER(www) AS rn
        , s.setnum, s.arr AS bike_set
        FROM drag w
        JOIN sets s ON s.arr = w.arr
        JOIN top g0 ON g0.nseq <= w.seq
        JOIN bot g1 ON g1.nseq >= w.seq AND g1.clust = g0.clust
        WINDOW www AS (PARTITION BY g1.clust ORDER BY w.stamp)
        ORDER BY g1.clust, w.stamp
        ;

Result:

 orgseq |        stamp        | clust | rn | setnum | bike_set 
--------+---------------------+-------+----+--------+----------
      1 | 2016-05-28 11:00:00 |     1 |  1 |      1 | {1,2}
      4 | 2016-05-28 11:00:20 |     3 |  1 |      1 | {1,2}
      5 | 2016-05-28 11:00:25 |     3 |  2 |      1 | {1,2}
      6 | 2016-05-28 11:00:05 |     4 |  1 |      3 | {1,2,3}
      7 | 2016-05-28 11:00:10 |     4 |  2 |      3 | {1,2,3}
      8 | 2016-05-28 11:00:10 |     4 |  3 |      2 | {4,5}
(6 rows)

Correct way to eliminate duplicates from a window query

You need more sophistication if version downgrades (or NULL values?) are possible:

SELECT min(version) AS version, min(date), max(date)
FROM  (
   SELECT version, date
        , count(*) FILTER (WHERE step IS NOT FALSE) OVER (ORDER BY date) AS grp
   FROM  (
      SELECT version, date
           , lag(version) OVER (ORDER BY date) <> version AS step
      FROM   cluster_info
      WHERE  cluster_id = '0f4ce21e-0d08-11ec-b209-0242c0a8c004'
      ORDER  BY date
      ) sub1
   ) sub2
GROUP  BY grp;

db<>fiddle here (sample data extended with version downgrade and unknown version)

See (with detailed explanation and links to more):

Select longest continuous sequence
How to group timestamps into islands (based on arbitrary gap)?

How to Group Timestamps into Islands (Based on Arbitrary Gap)