Join a Count Query on Generate_Series() and Retrieve Null Values as '0'

Join a count query on generate_series() and retrieve Null values as '0'

Untangled, simplified and fixed, it might look like this:

SELECT to_char(s.tag,'yyyy-mm') AS monat
, count(t.id) AS eintraege
FROM (
SELECT generate_series(min(date_from)::date
, max(date_from)::date
, interval '1 day'
)::date AS tag
FROM mytable t
) s
LEFT JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1
GROUP BY 1
ORDER BY 1;

db<>fiddle here

Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:

WHERE version = 1

You made correct use of RIGHT [OUTER] JOIN. But adding a WHERE clause that requires an existing row from mytable converts the RIGHT [OUTER] JOIN to an [INNER] JOIN effectively.

Move that filter into the JOIN condition to make it work.

I simplified some other things while being at it.

Better, yet

SELECT to_char(mon, 'yyyy-mm') AS monat
, COALESCE(t.ct, 0) AS eintraege
FROM (
SELECT date_trunc('month', date_from)::date AS mon
, count(*) AS ct
FROM mytable
WHERE version = 1
GROUP BY 1
) t
RIGHT JOIN (
SELECT generate_series(date_trunc('month', min(date_from))
, max(date_from)
, interval '1 mon')::date
FROM mytable
) m(mon) USING (mon)
ORDER BY mon;

db<>fiddle here

It's much cheaper to aggregate first and join later - joining one row per month instead of one row per day.

It's cheaper to base GROUP BY and ORDER BY on the date value instead of the rendered text.

count(*) is a bit faster than count(id), while equivalent in this query.

generate_series() is a bit faster and safer when based on timestamp instead of date. See:

  • Generating time series between two dates in PostgreSQL

Filtering a query that uses generate_series while keeping null rows

Replace WHERE t.status = ANY(ARRAY['new', 'cancelled']) with AND t.status = ANY(ARRAY['new', 'cancelled']), and it works correctly!

Get apps with the highest review count since a dynamic series of days

I think this is what you are looking for:

Postgres 13 or newer

WITH cte AS (  -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT app_id, total_ct
FROM cte c
WHERE c.earliest_review >= d.review_window_start
ORDER BY total_ct DESC
FETCH FIRST 1 ROWS WITH TIES -- new & hot
) sub
GROUP BY 1
) a ON true;

WITH TIES makes it a bit cheaper. Added in Postgres 13 (currently beta). See:

  • Get top row(s) with highest value, with ties

Postgres 12 or older

WITH cte AS (  -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT total_ct, app_id
, rank() OVER (ORDER BY total_ct DESC) AS rnk
FROM cte c
WHERE c.earliest_review >= d.review_window_start
) sub
WHERE rnk = 1
GROUP BY 1
) a ON true;

db<>fiddle here

Same as above, but without WITH TIES.

We don't need to involve the table apps at all. The table reviews has all information we need.

The CTE cte computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.

It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED in Postgres 12 or later to force it. See:

  • How to force evaluation of subquery before joining / pushing down to foreign server

The optimized generate_series() call produces the series of days from earliest to latest review. See:

  • Generating time series between two dates in PostgreSQL
  • Join a count query on generate_series() and retrieve Null values as '0'

Finally, the LEFT JOIN LATERAL you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start. Alternatively, define tiebreaker(s) to get at most one winner. See:

  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

How to right join these series so that this query will return results where count = 0 in Postgresql SQL?

I recommend avoiding RIGHT JOINs and using LEFT JOINs. They are just simpler for following the logic -- keep everything in the first table and matching rows in the subsequent ones.

Your issue is the placement of the filter -- it filters out the outer joined rows. So that needs to go into the ON clause. I would recommend:

SELECT NULL AS fromdate, gs.eventlevel,
COUNT(e.id) as count
FROM generate_series(0, 3) gs(eventlevel) LEFT JOIN
event e
ON e.event_level = gs.eventlevel AND e.project_id = :projectId
GROUP BY gs.eventlevel
ORDER BY gs.eventlevel DESC;

Note the other simplifications:

  • No subquery is needed for generate_series.
  • You can use COUNT() instead of your case logic.

Return 0 or Null value if there is no data in a table for a particular date?

I got My solution.First i generate date series and cross join with meterno which i am passing as a varchar.I use regexp_split_to_table function to display passing meterno as row wise,Then i cross join with date.

Please check my solution.

select m.sdate , COALESCE (m.meterno,'')meter_number,
CASE WHEN n.count IS NULL THEN '0' ELSE n.count END count
from
(select date(d) as sdate, mn.meterno
from generate_series (CAST('2019-07-01' as TIMESTAMP), CAST('2019-07-06' as TIMESTAMP), interval '1 day') d
cross join ( select regexp_split_to_table( '92590714_92590130_92589492_92590381', '_') as meterno) mn)m
LEFT JOIN
(SELECT meter_number,date(read_time) as rdate,COUNT(*) FROM meter_data.load_survey
WHERE date(read_time) BETWEEN '2019-07-01' and '2019-07-06' and meter_number in ('92590714','92590130','92589492','92590381')
GROUP BY meter_number,date(read_time) )n
ON m.sdate=n.rdate and m.meterno=n.meter_number
GROUP BY m.sdate,m.meterno,n.count

Use Ecto to generate_series in postgres and also retrieve Null-values as “0”

You can use right_join since you're joining clicks with the generated dates in the Ecto Query instead of generated dates with clicks like you did in the raw SQL.

Test:

alias MyApp.{Click, Repo}
import Ecto.Query

from(c in Click,
order_by: c.inserted_at,
select: fragment("date(?) as day", c.inserted_at)
)
|> Repo.all
|> IO.inspect

from(c in Click,
right_join: day in fragment("select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d "),
on: fragment("date(?) = d", field(c, :inserted_at)) and c.link_id == 15,
group_by: fragment("day"),
order_by: fragment("day"),
select: %{
day: fragment("date(d) as day"),
count: count(c.id)
}
)
|> Repo.all
|> IO.inspect

Output:

[debug] QUERY OK source="clicks" db=1.9ms
SELECT date(c0."inserted_at") as day FROM "clicks" AS c0 ORDER BY c0."inserted_at" []
[{2016, 9, 12}, {2016, 9, 13}, {2016, 9, 13}, {2016, 9, 13}, {2016, 9, 13},
{2016, 9, 15}, {2016, 9, 17}, {2016, 9, 17}]
[debug] QUERY OK source="clicks" db=1.8ms
SELECT date(d) as day, count(c0."id") FROM "clicks" AS c0 RIGHT OUTER JOIN (select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d ) AS f1 ON date(c0."inserted_at") = d GROUP BY day ORDER BY day []
[%{count: 0, day: {2016, 9, 8}}, %{count: 0, day: {2016, 9, 9}},
%{count: 0, day: {2016, 9, 10}}, %{count: 0, day: {2016, 9, 11}},
%{count: 1, day: {2016, 9, 12}}, %{count: 4, day: {2016, 9, 13}},
%{count: 0, day: {2016, 9, 14}}, %{count: 1, day: {2016, 9, 15}},
%{count: 0, day: {2016, 9, 16}}, %{count: 2, day: {2016, 9, 17}},
%{count: 0, day: {2016, 9, 18}}]

Edit: You might also want to make more use of Ecto's powerful querying ability and rely less on as and fragment. The following query generates the same output:

from(c in Click,
right_join: day in fragment("select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d"),
on: day.d == fragment("date(?)", c.inserted_at) and c.link_id == 15,
group_by: day.d,
order_by: day.d,
select: %{
day: fragment("date(?)", day.d),
count: count(c.id)
}
)

Misleading count of 1 on JOIN in Postgres 11.7

You figured out the problem correctly: count() behaves differently depending on the argument is is given.

count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).

On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.

Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).

Bottom line:

  • in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)

  • in specific cases such as yours, put the relevant expression in the count()

PostgreSQL - left join generate_series() and table

You can put generate_series() in the FROM. So, I think you want something like this:

select gs.x, cast(p.polutionmm2/100 as char(8)) as metric
from generate_series(0,200,1) gs(x) left join
p
on gs.x = (p.polutionmm2/100);

I imagine there is also more to your query, because this doesn't do much that is useful.

SQL select from generate_series, filtering by user_id removes series?

instead of taking all the data from WORKOUTS table you can put this condition over there also as -

SELECT 
DISTINCT date_trunc('day', series.date)::date as date,
sum(COALESCE(reps, 0)) OVER WIN,
array_agg(workout_id) OVER WIN as ids
FROM (
select generate_series(-22, 0) + current_date as date
) series
LEFT JOIN (
exercises INNER JOIN (select * from workouts where user_id = 5) workout
ON exercises.workout_id = workouts.id
)
ON series.date = exercises.created_at::date
WINDOW
WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

I think this should give you the output what you are looking for.



Related Topics



Leave a reply



Submit