Postgresql Query to Count/Group by Day and Display Days with No Data

PostgreSQL query to count/group by day and display days with no data

You just need a left outer join instead of an inner join:

SELECT d.date, count(se.id)
FROM
(
SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
FROM generate_series(0, 365, 1) AS offs
) d
LEFT OUTER JOIN sharer_emailshare se
ON d.date = to_char(date_trunc('day', se.created), 'YYYY-MM-DD')
GROUP BY d.date;

Group by days, display days with no data and complex query in left join

Guessing missing details, this query might be what you are looking for:

WITH p AS (
SELECT '2015-08-15'::date AS a, '2015-08-22'::date AS z -- enter bounds once
, id, name, page_variant_id, variant_name
FROM pages
WHERE page_variant_id = '25' -- enter ID once
)
SELECT p.id, p.name, p.page_variant_id, p.variant_name
, day, v.views_count, pv.page_visits_count, c.conversions_count
FROM p
, LATERAL (SELECT day::date FROM generate_series(p.a, p.z, interval '1 day') day) d
LEFT JOIN (
SELECT upv.updated_at::date AS day, count(*) AS views_count
FROM p
JOIN page_views pv ON pv.page_id = p.id
JOIN unique_page_visits upv ON upv.id = pv.unique_page_visit_id
WHERE upv.updated_at BETWEEN p.a AND p.z
GROUP BY 1
) v USING (day)
LEFT JOIN (
SELECT upv.updated_at::date AS day, count(*) AS page_visits_count
FROM p
JOIN unique_page_visits upv ON upv.page_id = p.id
WHERE upv.updated_at BETWEEN p.a AND p.z
GROUP BY 1
) pv USING (day)
LEFT JOIN (
SELECT upv.updated_at::date AS day, count(*) AS conversions_count
FROM p
JOIN conversion_goals cg ON cg.page_id = p.id
JOIN conversions c ON c.conversion_goal_id = cg.id
JOIN unique_page_visits upv ON upv.id = c.unique_page_visit_id
WHERE cg.name = 'popup'
AND c.updated_at BETWEEN p.a AND p.z
GROUP BY 1
) c USING (day)
ORDER BY day;

PostgreSQL combining a count, a last 7 days count and a last 30 days count Grouped by days

If your data may have gaps in days, then you need a range frame specification rather than a rows frame. Happily Postgres supports this specification, so you can do:

select
datetimea::date date,
count(*) "day",
sum(count(*)) over(
order by datetimea::date
range between '7 day' preceding and current row
) l7,
sum(count(*)) over(
order by datetimea::date
range between '30 day' preceding and current row
) l30
from mytable
group by datetimea::date
order by datetimea::date

Demo on DB Fiddle:


date | day | l7 | l30
:--------- | --: | -: | --:
2020-03-16 | 1 | 1 | 1
2020-03-17 | 1 | 2 | 2
2020-03-18 | 1 | 3 | 3
2020-03-19 | 1 | 4 | 4
2020-03-20 | 1 | 5 | 5
2020-03-21 | 1 | 6 | 6
2020-03-22 | 2 | 8 | 8
2020-03-28 | 1 | 4 | 9
2020-03-29 | 1 | 4 | 10
2020-03-30 | 2 | 4 | 12
2020-03-31 | 1 | 5 | 13

Grouping by date, with 0 when count() yields no lines

Given that you don't have the dates in the table, you need a way to generate them. You can use the generate_series function:

SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;

This will produce results like this:

         ts          
---------------------
2012-01-01 00:00:00
2012-01-01 01:00:00
2012-01-01 02:00:00
2012-01-01 03:00:00
...
2012-01-07 21:00:00
2012-01-07 22:00:00
2012-01-07 23:00:00
(168 rows)

The remaining task is to join the two selects using an outer join like this :

select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from 
(
SELECT extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
FROM sr
where date>'2012-01-01' and date <'2012-01-07'
GROUP BY extract ( day from date ) , extract ( hour from date )
) AS cnt
right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day
order by day,hour asc;

Grouping timestamps by day, not by time

SELECT date_trunc('day', user_logs.timestamp) "day", count(*) views
FROM user_logs
WHERE user_logs.timestamp >= %(timestamp_1)s
AND user_logs.timestamp <= %(timestamp_2)s
group by 1
ORDER BY 1

How to return rows with 0 count for missing data?

You can create the list of all first days of the last year (say) with

select distinct date_trunc('month', (current_date - offs)) as date 
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01

Then you can join with that series.

Query for displaying count per type for each date in range

This Single SQL will work for all months. This is for the current example. You can adjust the number and dates.

SQL:

SELECT lunch_name,
Sum(day1) "day1",
Sum(day2) "day2",
Sum(day3) "day3",
Sum(day4) "day4",
Sum(day5) "day5",
Sum(day6) "day6",
Sum(day7) "day7",
Sum(day8) "day8"
FROM (SELECT lunch_name,
CASE
WHEN Extract(day FROM order_date) = 21 THEN cnt
END "day1",
CASE
WHEN Extract(day FROM order_date) = 22 THEN cnt
END "day2",
CASE
WHEN Extract(day FROM order_date) = 23 THEN cnt
END "day3",
CASE
WHEN Extract(day FROM order_date) = 24 THEN cnt
END "day4",
CASE
WHEN Extract(day FROM order_date) = 25 THEN cnt
END "day5",
CASE
WHEN Extract(day FROM order_date) = 26 THEN cnt
END "day6",
CASE
WHEN Extract(day FROM order_date) = 27 THEN cnt
END "day7",
CASE
WHEN Extract(day FROM order_date) = 28 THEN cnt
END "day8"
FROM (SELECT DISTINCT lunch_name,
order_date,
Count(lunch_name)
OVER(
partition BY lunch_name, order_date
ORDER BY order_date) cnt
FROM lunch_order lo
INNER JOIN lunch_list ll
ON ll.lunch_id = lo.lunch_type) inline_view)
final_inline_view
GROUP BY lunch_name;

Output:

 lunch_name | day1 | day2 | day3 | day4 | day5 | day6 | day7 | day8
------------+------+------+------+------+------+------+------+------
ccc | | | 1 | 1 | | | 1 | 1
bbb | 2 | 1 | | | | | | 1
(2 rows)

Setup:

create table lunch_order(lunch_type int , order_date date);
create table lunch_list(lunch_id int,lunch_name varchar(10));
insert into lunch_order values
,(20,'2021-12-22'),(22,'2021-12-23'),(22,'2021-12-24'),(22,'2021-12-27'),(20,'2021-12-28'),(22,'2021-12-28');
insert into lunch_list values(1,'aaa'),(20,'bbb'),(22,'ccc'),(23,'ddd'),(24,'eee');

postgres=# select * from lunch_order;
lunch_type | order_date
------------+------------
20 | 2021-12-21
20 | 2021-12-22
22 | 2021-12-23
22 | 2021-12-24
22 | 2021-12-27
22 | 2021-12-28
20 | 2021-12-21
20 | 2021-12-28
(8 rows)

postgres=# select * from lunch_list;
lunch_id | lunch_name
----------+------------
1 | aaa
20 | bbb
22 | ccc
23 | ddd
24 | eee
(5 rows)

Get count of created entries for each day

SELECT day, COALESCE(ct, 0) AS ct
FROM (SELECT now()::date - d AS day FROM generate_series (0, 6) d) d -- 6, not 7
LEFT JOIN (
SELECT created_at::date AS day, count(*) AS ct
FROM entries
WHERE created_at >= date_trunc('day', now()) - interval '6d'
GROUP BY 1
) e USING (day);

Use a sargable expression for your WHERE condition, so Postgres can use a plain index on created_at. Far more important for performance than all the rest.

To cover a week (including today), subtract 6 days from the start of "today", not 7.
Alternatively, shift the week by 1 to end "yesterday", as "today" is obviously incomplete, yet.

Assuming that id is defined NOT NULL, count(*) is identical to count(id), but slightly faster. See:

  • Why is count(x.) slower than count()?

A CTE is not needed for the simple case. Would be slower and more verbose.

Aggregate first, join later. That's faster.

now() is Postgres' short syntax for the standard SQL CURRENT_TIMESTAMP (which you can use as well). See:

  • Difference between now() and current_timestamp

This should be the shortest and fastest query. Test with EXPLAIN ANALYZE.

Related:

  • Selecting sum and running balance for last 18 months with generate_series
  • PostgreSQL: running count of rows for a query 'by minute'


Related Topics



Leave a reply



Submit