Join a count query on generate_series() and retrieve Null values as '0'
Untangled, simplified and fixed, it might look like this:
SELECT to_char(s.tag,'yyyy-mm') AS monat
, count(t.id) AS eintraege
FROM (
SELECT generate_series(min(date_from)::date
, max(date_from)::date
, interval '1 day'
)::date AS tag
FROM mytable t
) s
LEFT JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1
GROUP BY 1
ORDER BY 1;
db<>fiddle here
Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:
WHERE version = 1
You made correct use of RIGHT [OUTER] JOIN
. But adding a WHERE
clause that requires an existing row from mytable
converts the RIGHT [OUTER] JOIN
to an [INNER] JOIN
effectively.
Move that filter into the JOIN
condition to make it work.
I simplified some other things while being at it.
Better, yet
SELECT to_char(mon, 'yyyy-mm') AS monat
, COALESCE(t.ct, 0) AS eintraege
FROM (
SELECT date_trunc('month', date_from)::date AS mon
, count(*) AS ct
FROM mytable
WHERE version = 1
GROUP BY 1
) t
RIGHT JOIN (
SELECT generate_series(date_trunc('month', min(date_from))
, max(date_from)
, interval '1 mon')::date
FROM mytable
) m(mon) USING (mon)
ORDER BY mon;
db<>fiddle here
It's much cheaper to aggregate first and join later - joining one row per month instead of one row per day.
It's cheaper to base GROUP BY
and ORDER BY
on the date
value instead of the rendered text
.
count(*)
is a bit faster than count(id)
, while equivalent in this query.
generate_series()
is a bit faster and safer when based on timestamp
instead of date
. See:
- Generating time series between two dates in PostgreSQL
Filtering a query that uses generate_series while keeping null rows
Replace WHERE t.status = ANY(ARRAY['new', 'cancelled'])
with AND t.status = ANY(ARRAY['new', 'cancelled'])
, and it works correctly!
Get apps with the highest review count since a dynamic series of days
I think this is what you are looking for:
Postgres 13 or newer
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT app_id, total_ct
FROM cte c
WHERE c.earliest_review >= d.review_window_start
ORDER BY total_ct DESC
FETCH FIRST 1 ROWS WITH TIES -- new & hot
) sub
GROUP BY 1
) a ON true;
WITH TIES
makes it a bit cheaper. Added in Postgres 13 (currently beta). See:
- Get top row(s) with highest value, with ties
Postgres 12 or older
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT total_ct, app_id
, rank() OVER (ORDER BY total_ct DESC) AS rnk
FROM cte c
WHERE c.earliest_review >= d.review_window_start
) sub
WHERE rnk = 1
GROUP BY 1
) a ON true;
db<>fiddle here
Same as above, but without WITH TIES
.
We don't need to involve the table apps
at all. The table reviews
has all information we need.
The CTE cte
computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.
It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED
in Postgres 12 or later to force it. See:
- How to force evaluation of subquery before joining / pushing down to foreign server
The optimized generate_series()
call produces the series of days from earliest to latest review. See:
- Generating time series between two dates in PostgreSQL
- Join a count query on generate_series() and retrieve Null values as '0'
Finally, the LEFT JOIN LATERAL
you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start
. Alternatively, define tiebreaker(s) to get at most one winner. See:
- What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
How to right join these series so that this query will return results where count = 0 in Postgresql SQL?
I recommend avoiding RIGHT JOIN
s and using LEFT JOIN
s. They are just simpler for following the logic -- keep everything in the first table and matching rows in the subsequent ones.
Your issue is the placement of the filter -- it filters out the outer joined rows. So that needs to go into the ON
clause. I would recommend:
SELECT NULL AS fromdate, gs.eventlevel,
COUNT(e.id) as count
FROM generate_series(0, 3) gs(eventlevel) LEFT JOIN
event e
ON e.event_level = gs.eventlevel AND e.project_id = :projectId
GROUP BY gs.eventlevel
ORDER BY gs.eventlevel DESC;
Note the other simplifications:
- No subquery is needed for
generate_series
. - You can use
COUNT()
instead of your case logic.
Return 0 or Null value if there is no data in a table for a particular date?
I got My solution.First i generate date series and cross join with meterno which i am passing as a varchar.I use regexp_split_to_table function to display passing meterno as row wise,Then i cross join with date.
Please check my solution.
select m.sdate , COALESCE (m.meterno,'')meter_number,
CASE WHEN n.count IS NULL THEN '0' ELSE n.count END count
from
(select date(d) as sdate, mn.meterno
from generate_series (CAST('2019-07-01' as TIMESTAMP), CAST('2019-07-06' as TIMESTAMP), interval '1 day') d
cross join ( select regexp_split_to_table( '92590714_92590130_92589492_92590381', '_') as meterno) mn)m
LEFT JOIN
(SELECT meter_number,date(read_time) as rdate,COUNT(*) FROM meter_data.load_survey
WHERE date(read_time) BETWEEN '2019-07-01' and '2019-07-06' and meter_number in ('92590714','92590130','92589492','92590381')
GROUP BY meter_number,date(read_time) )n
ON m.sdate=n.rdate and m.meterno=n.meter_number
GROUP BY m.sdate,m.meterno,n.count
Use Ecto to generate_series in postgres and also retrieve Null-values as “0”
You can use right_join
since you're joining clicks
with the generated dates in the Ecto Query instead of generated dates with clicks
like you did in the raw SQL.
Test:
alias MyApp.{Click, Repo}
import Ecto.Query
from(c in Click,
order_by: c.inserted_at,
select: fragment("date(?) as day", c.inserted_at)
)
|> Repo.all
|> IO.inspect
from(c in Click,
right_join: day in fragment("select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d "),
on: fragment("date(?) = d", field(c, :inserted_at)) and c.link_id == 15,
group_by: fragment("day"),
order_by: fragment("day"),
select: %{
day: fragment("date(d) as day"),
count: count(c.id)
}
)
|> Repo.all
|> IO.inspect
Output:
[debug] QUERY OK source="clicks" db=1.9ms
SELECT date(c0."inserted_at") as day FROM "clicks" AS c0 ORDER BY c0."inserted_at" []
[{2016, 9, 12}, {2016, 9, 13}, {2016, 9, 13}, {2016, 9, 13}, {2016, 9, 13},
{2016, 9, 15}, {2016, 9, 17}, {2016, 9, 17}]
[debug] QUERY OK source="clicks" db=1.8ms
SELECT date(d) as day, count(c0."id") FROM "clicks" AS c0 RIGHT OUTER JOIN (select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d ) AS f1 ON date(c0."inserted_at") = d GROUP BY day ORDER BY day []
[%{count: 0, day: {2016, 9, 8}}, %{count: 0, day: {2016, 9, 9}},
%{count: 0, day: {2016, 9, 10}}, %{count: 0, day: {2016, 9, 11}},
%{count: 1, day: {2016, 9, 12}}, %{count: 4, day: {2016, 9, 13}},
%{count: 0, day: {2016, 9, 14}}, %{count: 1, day: {2016, 9, 15}},
%{count: 0, day: {2016, 9, 16}}, %{count: 2, day: {2016, 9, 17}},
%{count: 0, day: {2016, 9, 18}}]
Edit: You might also want to make more use of Ecto's powerful querying ability and rely less on as
and fragment
. The following query generates the same output:
from(c in Click,
right_join: day in fragment("select generate_series(current_date - interval '10 day', current_date, '1 day')::date AS d"),
on: day.d == fragment("date(?)", c.inserted_at) and c.link_id == 15,
group_by: day.d,
order_by: day.d,
select: %{
day: fragment("date(?)", day.d),
count: count(c.id)
}
)
Misleading count of 1 on JOIN in Postgres 11.7
You figured out the problem correctly: count()
behaves differently depending on the argument is is given.
count(*)
counts how many rows belong to the group. This just cannot be 0
since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count()
takes in account any non-null
value, and ignores null
values. For your query, this lets you distinguish groups that have no match in the left join
ed table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses
count(*)
; this is more efficient, since the database does not need to check fornull
s (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)in specific cases such as yours, put the relevant expression in the
count()
PostgreSQL - left join generate_series() and table
You can put generate_series()
in the FROM
. So, I think you want something like this:
select gs.x, cast(p.polutionmm2/100 as char(8)) as metric
from generate_series(0,200,1) gs(x) left join
p
on gs.x = (p.polutionmm2/100);
I imagine there is also more to your query, because this doesn't do much that is useful.
SQL select from generate_series, filtering by user_id removes series?
instead of taking all the data from WORKOUTS table you can put this condition over there also as -
SELECT
DISTINCT date_trunc('day', series.date)::date as date,
sum(COALESCE(reps, 0)) OVER WIN,
array_agg(workout_id) OVER WIN as ids
FROM (
select generate_series(-22, 0) + current_date as date
) series
LEFT JOIN (
exercises INNER JOIN (select * from workouts where user_id = 5) workout
ON exercises.workout_id = workouts.id
)
ON series.date = exercises.created_at::date
WINDOW
WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;
I think this should give you the output what you are looking for.
Related Topics
How to Execute a Stored Procedure Inside a Select Query
Rails .Where() Query Not Working
Where Does the Practice "Exists (Select 1 from ...)" Come From
Insert Data and Set Foreign Keys with Postgres
What Is the Problem with Foreign Key Cascade Multiple Paths and Cycles
Adding Constraints That Check a Separate (Linked) Table for a Value
Icalendar "Field" List (For Database Schema Based on Icalendar Standard)
Why Is There a Huge Performance Difference Between Temp Table and Subselect
SQL Recursive Query That Gets All Ancestors of an Item
How to Return Default Value from SQL Query
How to Connect to SQL Server 2005 Database Through Ruby
How to Compare Software Versions Using SQL Server
Linq to SQL: How to Stop the Auto Generated Object Name from Being Renamed
Is There an Oracle Equivalent to SQL Server's Output Inserted.*
Returning Result Even for Elements in In List That Don't Exist in Table