Generate_Series in Postgres from Start and End Date in a Table

Generate_series in Postgres from start and end date in a table

You don't need a CTE for this, that would be more expensive than necessary.

And you don't need to cast to timestamp, the result already is of data type timestamp when you feed timestamp types to generate_series(). Details here:

  • Generating time series between two dates in PostgreSQL

In Postgres 9.3 or later you can use a LATERAL join:

SELECT to_char(ts, 'YYYY-MM-DD HH24') AS formatted_ts
FROM (
SELECT min(start_timestamp) as first_date
, max(start_timestamp) as last_date
FROM header_table
) h
, generate_series(h.first_date, h.last_date, interval '1 hour') g(ts);

Optionally with to_char() to get the result as text in the format you mentioned.

This works in any Postgres version:

SELECT generate_series(min(start_timestamp)
, max(start_timestamp)
, interval '1 hour') AS ts
FROM header_table;

Typically a bit faster.

Calling set-returning functions in the SELECT list is a non-standard-SQL feature and frowned upon by some. Also, there were behavioral oddities (though not for this simple case) that were eventually fixed in Postgres 10. See:

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?

Note a subtle difference in NULL handling:

The equivalent of

max(start_timestamp)

is obtained with

ORDER BY start_timestamp DESC NULLS LAST
LIMIT 1

Without NULLS LAST NULL values come first in descending order (if there can be NULL values in start_timestamp). You would get NULL for last_date and your query would come up empty.

Details:

  • Why do NULL values come first when ordering DESC in a PostgreSQL query?

start date & end date from table in generate_series()

Your example code is a bit confusing, so just considering business days and your table, this will give expected output:

CREATE TABLE myTable (start_date date, end_date date);

INSERT INTO myTable VALUES('2017-06-01', '2017-06-01'),('2017-05-29', '2017-06-02');

SELECT start_date, end_date,
SUM(CASE WHEN EXTRACT(dow FROM days) BETWEEN 1 AND 5 THEN 1 ELSE 0 END)
FROM myTable
CROSS JOIN LATERAL generate_series(start_date, end_date, interval '1 day') AS days
GROUP BY start_date, end_date;

It will generate rows, each having start_date, end_date and one day in between those. Then it will agregate by start_date, end_date and SUM() dates that have day of week between 1 and 5 (monday-friday).

Without aggregation it looks like this:

 start_date |  end_date  |          days          | dow
------------+------------+------------------------+-----
2017-06-01 | 2017-06-01 | 2017-06-01 00:00:00+02 | 4
2017-05-29 | 2017-06-02 | 2017-05-29 00:00:00+02 | 1
2017-05-29 | 2017-06-02 | 2017-05-30 00:00:00+02 | 2
2017-05-29 | 2017-06-02 | 2017-05-31 00:00:00+02 | 3
2017-05-29 | 2017-06-02 | 2017-06-01 00:00:00+02 | 4
2017-05-29 | 2017-06-02 | 2017-06-02 00:00:00+02 | 5

And group by:

 start_date |  end_date  | sum
------------+------------+-----
2017-06-01 | 2017-06-01 | 1
2017-05-29 | 2017-06-02 | 5

How to generate series using start and end date and quarters on postgres

You can do this by using the GENERATE_SERIES function and passing in the start and end date for each unique (by ID) row and setting the interval to 3 months. Then join the result back with your original table on both ID and quarter.

Here's an example (note original_data is what I've called your first table):

WITH
quarters_table AS (
SELECT
t.ID,
(EXTRACT('month' FROM t.quarter_date) - 1)::INT / 3 + 1 AS quarter,
t.quarter_date::DATE AS start_date,
COALESCE(
LEAD(t.quarter_date) OVER (),
DATE_TRUNC('quarter', t.original_end_date) + INTERVAL '3 months'
)::DATE AS end_date
FROM (
SELECT
original_record.ID,
original_record.end_date AS original_end_date,
GENERATE_SERIES(
DATE_TRUNC('quarter', original_record.start_date),
DATE_TRUNC('quarter', original_record.end_date),
INTERVAL '3 months'
) AS quarter_date
FROM (
SELECT DISTINCT ON (original_data.ID)
original_data.ID,
original_data.start_date,
original_data.end_date
FROM
original_data
ORDER BY
original_data.ID
) AS original_record
) AS t
)

SELECT
quarters_table.ID,
quarters_table.quarter,
original_data.value::DOUBLE PRECISION / 3 AS value,
quarters_table.start_date,
quarters_table.end_date
FROM
quarters_table
INNER JOIN
original_data
ON
quarters_table.ID = original_data.ID
AND quarters_table.quarter = original_data.quarter;

Sample output:

 id | quarter |      value       | start_date |  end_date  
----+---------+------------------+------------+------------
1 | 1 | 261.666666666667 | 2020-01-01 | 2020-04-01
1 | 2 | 50.6666666666667 | 2020-04-01 | 2020-07-01
1 | 1 | 261.666666666667 | 2021-01-01 | 2021-04-01
1 | 2 | 50.6666666666667 | 2021-04-01 | 2021-07-01

For completeness, here's the original_data table I've used in testing:

WITH
original_data AS (
SELECT
1 AS ID,
2 AS quarter,
152 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date

UNION ALL

SELECT
1 AS ID,
1 AS quarter,
785 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date

UNION ALL

SELECT
2 AS ID,
2 AS quarter,
152 AS value,
'2019-03-05'::DATE AS start_date,
'2050-12-30'::DATE AS end_date

-- ...
)

Generating time series between two dates in PostgreSQL

Can be done without conversion to/from int (but to/from timestamp instead)

SELECT date_trunc('day', dd):: date
FROM generate_series
( '2007-02-01'::timestamp
, '2008-04-01'::timestamp
, '1 day'::interval) dd
;

How can I generate a series of date ranges that fall between a start date and end date?

I would do it like this:

with input as (
select '2020-11-01 16:30:00'::timestamptz as start_date,
'2020-11-03 18:30:00'::timestamptz as end_date
)
select row_number() over (order by ddate) as row,
case
when start_date::date = ddate
and start_date > ddate + interval '16 hours'
then start_date
else ddate + interval '16 hours'
end as start_date,
case
when end_date::date = ddate
and end_date < ddate + interval '21 hours'
then end_date
else ddate + interval '21 hours'
end as end_date
from input
cross join lateral
generate_series(
case
when start_date::time > '21:00' then start_date::date + interval '1 day'
else start_date::date
end,
case
when end_date::time < '16:00' then end_date::date - interval '1 day'
else end_date::date
end,
interval '1 day') as gs(ddate)
;
┌─────┬────────────────────────┬────────────────────────┐
│ row │ start_date │ end_date │
├─────┼────────────────────────┼────────────────────────┤
│ 1 │ 2020-11-01 16:30:00-05 │ 2020-11-01 21:00:00-05 │
│ 2 │ 2020-11-02 16:00:00-05 │ 2020-11-02 21:00:00-05 │
│ 3 │ 2020-11-03 16:00:00-05 │ 2020-11-03 18:30:00-05 │
└─────┴────────────────────────┴────────────────────────┘
(3 rows)

Generate custom start and end date series in postgresql and sum qty for a given billing cycle

The calendar is messy with months of varying days between the and even within the same month (Feb). Fortunately Postgres is aware of the irregularities (almost always). So work with it, let Postgres take care of them. With intervals of 1 month and 1 day can you make the irregularity of Feb and the 30/31 days per months simply disappear.




The first start_date is the subscription_date and its end date is the end of the start_date month or the end of following month. From there every thing follows a specific pattern: start_date is prior end_date + 1 day, end_date is the last day of that month. The issue in determining the last day of the month. But this is simply done: Truncate the Start_Date to the Month add 1 Month and subtract 1 day. Note this totally ignores the actual month and the number of days within it. You then wrap all that in a recursive CTE which steps from from 1 row to the next (iteratively). As follows:

with recursive billing( account_id, subscription_start, start_date, end_date) as  
( select s.account_id
, s.subscription_start
, s.subscription_start::date
, case when extract(day from s.subscription_start) > 28
then (date_trunc('month', s.subscription_start+interval '2 month' ) - interval '1 day')::date
else (date_trunc('month', s.subscription_start+interval '1 month' ) - interval '1 day')::date
end
from subscriptions s
union all
select b.account_id
, b.subscription_start
, (b.end_date + interval '1 day')::date
, ((b.end_date + interval '1 day')::date + interval '1 month' - interval '1 day')::date
from billing b
where date_trunc('month',end_date) <= date_trunc('month',now())
)
select * from billing
order by account_id desc, start_date;

See demo here. Check difference between Feb 2020 and Feb 2021.

postgres sql - generate_series for date - how to do a WHERE condition on the date (has timezone)

Double quotes stand for identifiers, while what you want is a literal date. You can phrase this using a standard literal date:

where d::date = date '2020-11-21'

Or with a Postgres cast:

where d::date = '2020-11-21'::date

If d has no time component, casting it is not necessary:

where d = date '2020-11-21'

And event if it does, you can still avoid the cast with a half-open filter:

where d >= date '2020-11-21' and d < date '2020-11-21' + interval '1 day'

Postgresql generate date series (performance)

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;

The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.



Related Topics



Leave a reply



Submit