Postgresql Group Month Wise with Missing Values

Postgresql group month wise with missing values

you can use generate_series() function like this:

select
g.month,
count(m)
from generate_series(1, 12) as g(month)
left outer join my_table as m on
m.id_object = 1 and
m.status = 1 and
extract(year from m.time) = 2014 and
extract(month from m.time) = g.month
group by g.month
order by g.month

sql fiddle demo

How to include missing data for multiple groupings within the time span?

Based on some assumptions (ambiguities in the question) I suggest:

SELECT upper(trim(t.full_name)) AS teacher
, m.study_month
, r.room_code AS room
, count(s.room_id) AS study_count

FROM teachers t
CROSS JOIN generate_series(date_trunc('month', now() - interval '12 month') -- 12!
, date_trunc('month', now())
, interval '1 month') m(study_month)
CROSS JOIN rooms r
LEFT JOIN ( -- parentheses!
studies s
JOIN teacher_contacts tc ON tc.id = s.teacher_contact_id -- INNER JOIN!
) ON tc.teacher_id = t.id
AND s.study_dt >= m.study_month
AND s.study_dt < m.study_month + interval '1 month' -- sargable!
AND s.room_id = r.id
GROUP BY t.id, m.study_month, r.id -- id is PK of respective tables
ORDER BY t.id, m.study_month, r.id;

Major points

  • Build a grid of all desired combinations with CROSS JOIN. And then LEFT JOIN to existing rows. Related:

    • array_agg group by and null
    • Get created as well as deleted entries of last week
  • In your case, it's a join of several tables, so I use parentheses in the FROM list to LEFT JOIN to the result of INNER JOIN within the parentheses.
    It would be incorrect to LEFT JOIN to each table separately, because you would include hits on partial matches and get potentially incorrect counts.

  • Assuming referential integrity and working with PK columns directly, we don't need to include rooms and teachers on the left side a second time. But we still have a join of two tables (studies and teacher_contacts). The role of teacher_contacts is unclear to me. Normally, I would expect a relationship between studies and teachers directly. Might be further simplified ...

  • We need to count a non-null column on the left side to get the desired counts. Like count(s.room_id)

  • To keep this fast for big tables, make sure your predicates are sargable. And add matching indexes.

  • The column teacher is hardly (reliably) unique. Operate with a unique ID, preferably the PK (faster and simpler, too). I am still using teacher for the output to match your desired result. It might be wise to include a unique ID, since names can be duplicates.

  • You want:

    the past 12 months (including current month).

    So start with date_trunc('month', now() - interval '12 month' (not 13). That's rounding down the start already and does what you want - more accurately than your original query.


Since you mentioned slow performance, depending on actual table definitions and data distribution, it's probably faster to aggregate first and join later, like in this related answer:

  • Postgres - how to return rows with 0 count for missing data?

SELECT upper(trim(t.full_name)) AS teacher
, m.mon AS study_month
, r.room_code AS room
, COALESCE(s.ct, 0) AS study_count

FROM teachers t
CROSS JOIN generate_series(date_trunc('month', now() - interval '12 month') -- 12!
, date_trunc('month', now())
, interval '1 month') mon
CROSS JOIN rooms r
LEFT JOIN ( -- parentheses!
SELECT tc.teacher_id, date_trunc('month', s.study_dt) AS mon, s.room_id, count(*) AS ct
FROM studies s
JOIN teacher_contacts tc ON s.teacher_contact_id = tc.id
WHERE s.study_dt >= date_trunc('month', now() - interval '12 month') -- sargable
GROUP BY 1, 2, 3
) s ON s.teacher_id = t.id
AND s.mon = m.mon
AND s.room_id = r.id
ORDER BY 1, 2, 3;

About your closing remark:

the dataset would be fed to a pivot library ... (could not do this in PG directly)

Chances are you can use the two-parameter form of crosstab() to produce your desired result directly and with excellent performance and the above query is not needed to begin with. Consider:

  • PostgreSQL Crosstab Query

Add missing rows for a unique group set by date in PostgreSQL

You must CROSS join the distinct game_id and date combinations of the table to the distinct category of the table and then LEFT join to the table:

SELECT d.game_id, c.category, d.date, COALESCE(a.amount, 0) amount
FROM (SELECT DISTINCT game_id, date FROM activity) d
CROSS JOIN (SELECT DISTINCT category FROM activity) c
LEFT JOIN activity a
ON a.game_id = d.game_id AND a.date = d.date AND a.category = c.category
ORDER BY d.game_id, d.date

If you want to insert the missing rows in the table:

INSERT INTO activity (game_id, category, date, amount)
SELECT d.game_id, c.category, d.date, 0
FROM (SELECT DISTINCT game_id, date FROM activity) d
CROSS JOIN (SELECT DISTINCT category FROM activity) c
LEFT JOIN activity a
ON a.game_id = d.game_id AND a.date = d.date AND a.category = c.category
WHERE a.game_id IS NULL

See the demo.

How to include null values into grouping by date?

I think one of the easiest ways would be to create 2 auxiliar tables, one with the years you want to get info from (2000-current year), and another one with months (1-12), so you could perform an outer join with your actual table and get the number of mails created by year-month.

Let's say table years is called Years_Table with year_value column and Months', Months_table with month_value column, then you could do

SELECT TOP 200 month_value, year_value, COUNT(Email) AS Amount
FROM Contacts
RIGHT OUTER JOIN (SELECT year_value, month_value FROM Years_Table CROSS JOIN Months_Table) AS AUX_TABLE ON AUX_TABLE.year_value = YEAR(Added_Date) AND AUX_TABLE.month_value = MONTH(Added_Date)
GROUP BY month_value, year_value
ORDER BY year_value, month_value

Note: I ommited your CAST instruction since Year(added_date) should be a numeric, asuming your field added_date is a datetime field, on the contrary you should perform a different join.

Postgresql getting all months of year and null values

You can use generate_series() to generate the months for a year. Then you can join in the values from the aggregation using left join. Something like this:

select mon.mon, coalesce(s.sales, 0) as sales
from generate_series('2015-01-01'::timestamp, '2015-12-01'::timestamp, interval '1 month'
) as mon(mon) left join
(select date_trunc('Month', dateofmonth) as mon,
sum(sales) as SALES
from SALES_TABLE
group by mon
) s
on mon.mon = s.mon;

Include missing months in Group By query

This solution doesn't require you to hard-code the list of months you might want, all you need to do is provide any start date and any end date, and it will calculate the month boundaries for you. It includes year in the output so that it will support more than 12 months and so that your start and end dates can cross a year boundary and still order correctly and show the correct month and year.

DECLARE @StartDate SMALLDATETIME, @EndDate SMALLDATETIME;

SELECT @StartDate = '20120101', @EndDate = '20120630';

;WITH d(d) AS
(
SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, @StartDate), 0))
FROM ( SELECT TOP (DATEDIFF(MONTH, @StartDate, @EndDate) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id]) - 1
FROM sys.all_objects ORDER BY [object_id] ) AS n
)
SELECT
[Month] = DATENAME(MONTH, d.d),
[Year] = YEAR(d.d),
OrderCount = COUNT(o.OrderNumber)
FROM d LEFT OUTER JOIN dbo.OrderTable AS o
ON o.OrderDate >= d.d
AND o.OrderDate < DATEADD(MONTH, 1, d.d)
GROUP BY d.d
ORDER BY d.d;

Group query results by month and year in postgresql

select to_char(date,'Mon') as mon,
extract(year from date) as yyyy,
sum("Sales") as "Sales"
from yourtable
group by 1,2

At the request of Radu, I will explain that query:

to_char(date,'Mon') as mon, : converts the "date" attribute into the defined format of the short form of month.

extract(year from date) as yyyy : Postgresql's "extract" function is used to extract the YYYY year from the "date" attribute.

sum("Sales") as "Sales" : The SUM() function adds up all the "Sales" values, and supplies a case-sensitive alias, with the case sensitivity maintained by using double-quotes.

group by 1,2 : The GROUP BY function must contain all columns from the SELECT list that are not part of the aggregate (aka, all columns not inside SUM/AVG/MIN/MAX etc functions). This tells the query that the SUM() should be applied for each unique combination of columns, which in this case are the month and year columns. The "1,2" part is a shorthand instead of using the column aliases, though it is probably best to use the full "to_char(...)" and "extract(...)" expressions for readability.

How to return rows with 0 count for missing data?

You can create the list of all first days of the last year (say) with

select distinct date_trunc('month', (current_date - offs)) as date 
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01

Then you can join with that series.

PostgreSQL generating missing records and group them with source table

You can use LEFT JOIN and COALESCE

SELECT
d."Date",
coalesce(s."PowerOn", bigint '0') AS "PowerOn",
coalesce(s."Idle", bigint '0') AS "Idle",
coalesce(s."Run", bigint '0') AS "Run",
CONCAT_WS('%', ROUND(NULLIF(coalesce(s."Run", bigint '0')::numeric, 0) / NULLIF(coalesce(s."PowerOn", bigint '0')::numeric, 0) * 100, 2), '') As "Effectivity"
FROM (
SELECT generate_series(timestamp '2021-08-01 00:00:00'
, NOW()
, interval '1 day')::timestamp
) d
LEFT JOIN "Absolute_OEE" s ON d."Date"= s."Date"
AND s."Machine" = 'Machine01'
AND s."Date" > '2021-08-01 00:00:00'
GROUP BY d."Date",
coalesce(s."PowerOn", bigint '0'),
coalesce(s."Idle", bigint '0'),
coalesce(s."Run", bigint '0')
ORDER BY d."Date"


Related Topics



Leave a reply



Submit