SQL Group by Date Range

SQL Server: group dates by ranges

You need to do something like this

select t.range as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0-9 '
when score between 10 and 19 then '10-19'
when score between 20 and 29 then '20-29'
...
else '90-99' end as range
from scores) t
group by t.range

Check this link In SQL, how can you "group by" in ranges?

How to group data based on continuous date range?

the date range for price is 23.9 is not right because price not same for all the days in that range.

Because there are two same price in different overlapping date ranges, so you might get only one row when you used aggregate function.

This is a gap-and-island problem, we can try to use ROW_NUMBER window function to get the gap of overlapping date and then group by that.

SELECT  Product_Code,
min(Pricing_Date) AS Min_Date ,
max(Pricing_Date) AS Max_Date,
price
FROM (
SELECT *,
ROW_NUMBER() OVER(ORDER BY PRICING_DATE) - ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) grp
FROM PRICE_DATA
) t1
GROUP BY grp,Product_Code,price
ORDER BY min(Pricing_Date)

sqlfiddle

Explain

The gap-and-island problem is a feature

continuous(overlapping) data is that a set (continuous range of sequence) - (values ​​based on a certain order of conditions sequence) yields the same grouping.

so that We can use

  • ROW_NUMBER() OVER(ORDER BY PRICING_DATE) making a continuous range of values.
  • ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) making values ​​based on a certain order of conditions.

Then we will get a grouping column with overlapping data as sqlfiddle

SQL select data and grouping data by date range

If not using CTE, you can work the following query:

SELECT w1.price, w1.date, w2.date, w1.type FROM
(
SELECT * FROM mytable t1
WHERE NOT EXISTS (
SELECT 1 FROM mytable t2
WHERE
t1.price = t2.price AND
t1.type = t2.type AND
DATEDIFF(t2.date, t1.date) = -1
)
) w1
INNER JOIN
(
SELECT * FROM mytable t1
WHERE NOT EXISTS (
SELECT 1 FROM mytable t2
WHERE
t1.price = t2.price AND
t1.type = t2.type AND
DATEDIFF(t2.date, t1.date) = +1
)
) w2
ON
w1.price = w2.price AND
w1.type = w2.type AND
w1.date <= w2.date AND
NOT EXISTS (
SELECT * FROM mytable t1
WHERE NOT EXISTS (
SELECT 1 FROM mytable t2
WHERE
t1.price = t2.price AND
t1.type = t2.type AND
DATEDIFF(t2.date, t1.date) = +1
)
AND
w1.price = t1.price AND
w1.type = t1.type AND
w1.date <= t1.date AND t1.date < w2.date
)
  1. Getting the smaller and larger dates of each period.
  2. Joining these tables.
  3. Getting rows between smaller and larger dates.

DB Fiddle

Sql group query results by user id and date ranges dynamically

With all weeks starting on Monday, this would do it (efficiently):

SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
ORDER BY id, week_start;

db<>fiddle here

Working with standard weeks makes everything much simpler. We can aggregate in the "many" tables before joining, which is simpler and cheaper. Else, multiple joins can go wrong quickly. See:

  • Two SQL LEFT JOINS produce incorrect result

Standard weeks make it easier to compare data, too. (Note that first and last week per user can be truncated (span fewer days). But that applies to the last week per user in any case.)

The LATERAL keyword is assumed automatically in a join to a set-returning function:

CROSS  JOIN  generate_series(...)

See:

  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

Using DISTINCT ON to get the last_user_action per user. See:

  • Select first row in each GROUP BY group?

I advise to user legal, lower-case identifiers, so double-quoting is not required. Makes your life with Postgres easier.

Use last non-null action

Added in a comment:

if action is null in a current week, I want to grab most recent from previous weeks

SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
, last_user_action AS last_user_action_with_null
, COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM (
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
, count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
) sub
ORDER BY user_id, week_start;

db<>fiddle here

Explanation:

  • Retrieve last known value for each column of a row

Group by predefined date range

You are close. To define your tables' relationships in your FROM clause you don't want a WHERE clause, you want an ON clause:

SELECT t1.*, SUM(t2.boolean) as count
FROM table1 t1
LEFT JOIN table2 t2
ON t2.Date BETWEEN t1.period AND DATEADD(month, 1, t1.period)

Furthermore, because you are aggregating with a SUM() in your SELECT you will need to provide a GROUP BY to tell the database which columns to group on (every column that isn't being aggregated with a function like SUM()):

SELECT t1.*, SUM(t2.boolean) as count
FROM table1 t1
LEFT JOIN table2 t2
ON t2.Date BETWEEN t1.period AND DATEADD(month, 1, t1.period)
GROUP BY t1.period, t1.value1, t1.value2


Related Topics



Leave a reply



Submit