Postgresql group month wise with missing values
you can use generate_series()
function like this:
select
g.month,
count(m)
from generate_series(1, 12) as g(month)
left outer join my_table as m on
m.id_object = 1 and
m.status = 1 and
extract(year from m.time) = 2014 and
extract(month from m.time) = g.month
group by g.month
order by g.month
sql fiddle demo
How to include missing data for multiple groupings within the time span?
Based on some assumptions (ambiguities in the question) I suggest:
SELECT upper(trim(t.full_name)) AS teacher
, m.study_month
, r.room_code AS room
, count(s.room_id) AS study_count
FROM teachers t
CROSS JOIN generate_series(date_trunc('month', now() - interval '12 month') -- 12!
, date_trunc('month', now())
, interval '1 month') m(study_month)
CROSS JOIN rooms r
LEFT JOIN ( -- parentheses!
studies s
JOIN teacher_contacts tc ON tc.id = s.teacher_contact_id -- INNER JOIN!
) ON tc.teacher_id = t.id
AND s.study_dt >= m.study_month
AND s.study_dt < m.study_month + interval '1 month' -- sargable!
AND s.room_id = r.id
GROUP BY t.id, m.study_month, r.id -- id is PK of respective tables
ORDER BY t.id, m.study_month, r.id;
Major points
Build a grid of all desired combinations with
CROSS JOIN
. And thenLEFT JOIN
to existing rows. Related:- array_agg group by and null
- Get created as well as deleted entries of last week
In your case, it's a join of several tables, so I use parentheses in the
FROM
list toLEFT JOIN
to the result ofINNER JOIN
within the parentheses.
It would be incorrect toLEFT JOIN
to each table separately, because you would include hits on partial matches and get potentially incorrect counts.Assuming referential integrity and working with PK columns directly, we don't need to include
rooms
andteachers
on the left side a second time. But we still have a join of two tables (studies
andteacher_contacts
). The role ofteacher_contacts
is unclear to me. Normally, I would expect a relationship betweenstudies
andteachers
directly. Might be further simplified ...We need to count a non-null column on the left side to get the desired counts. Like
count(s.room_id)
To keep this fast for big tables, make sure your predicates are sargable. And add matching indexes.
The column
teacher
is hardly (reliably) unique. Operate with a unique ID, preferably the PK (faster and simpler, too). I am still usingteacher
for the output to match your desired result. It might be wise to include a unique ID, since names can be duplicates.You want:
the past 12 months (including current month).
So start with
date_trunc('month', now() - interval '12 month'
(not 13). That's rounding down the start already and does what you want - more accurately than your original query.
Since you mentioned slow performance, depending on actual table definitions and data distribution, it's probably faster to aggregate first and join later, like in this related answer:
- Postgres - how to return rows with 0 count for missing data?
SELECT upper(trim(t.full_name)) AS teacher
, m.mon AS study_month
, r.room_code AS room
, COALESCE(s.ct, 0) AS study_count
FROM teachers t
CROSS JOIN generate_series(date_trunc('month', now() - interval '12 month') -- 12!
, date_trunc('month', now())
, interval '1 month') mon
CROSS JOIN rooms r
LEFT JOIN ( -- parentheses!
SELECT tc.teacher_id, date_trunc('month', s.study_dt) AS mon, s.room_id, count(*) AS ct
FROM studies s
JOIN teacher_contacts tc ON s.teacher_contact_id = tc.id
WHERE s.study_dt >= date_trunc('month', now() - interval '12 month') -- sargable
GROUP BY 1, 2, 3
) s ON s.teacher_id = t.id
AND s.mon = m.mon
AND s.room_id = r.id
ORDER BY 1, 2, 3;
About your closing remark:
the dataset would be fed to a pivot library ... (could not do this in PG directly)
Chances are you can use the two-parameter form of crosstab()
to produce your desired result directly and with excellent performance and the above query is not needed to begin with. Consider:
- PostgreSQL Crosstab Query
Add missing rows for a unique group set by date in PostgreSQL
You must CROSS
join the distinct game_id
and date
combinations of the table to the distinct category
of the table and then LEFT
join to the table:
SELECT d.game_id, c.category, d.date, COALESCE(a.amount, 0) amount
FROM (SELECT DISTINCT game_id, date FROM activity) d
CROSS JOIN (SELECT DISTINCT category FROM activity) c
LEFT JOIN activity a
ON a.game_id = d.game_id AND a.date = d.date AND a.category = c.category
ORDER BY d.game_id, d.date
If you want to insert the missing rows in the table:
INSERT INTO activity (game_id, category, date, amount)
SELECT d.game_id, c.category, d.date, 0
FROM (SELECT DISTINCT game_id, date FROM activity) d
CROSS JOIN (SELECT DISTINCT category FROM activity) c
LEFT JOIN activity a
ON a.game_id = d.game_id AND a.date = d.date AND a.category = c.category
WHERE a.game_id IS NULL
See the demo.
How to include null values into grouping by date?
I think one of the easiest ways would be to create 2 auxiliar tables, one with the years you want to get info from (2000-current year), and another one with months (1-12), so you could perform an outer join with your actual table and get the number of mails created by year-month.
Let's say table years is called Years_Table with year_value column and Months', Months_table with month_value column, then you could do
SELECT TOP 200 month_value, year_value, COUNT(Email) AS Amount
FROM Contacts
RIGHT OUTER JOIN (SELECT year_value, month_value FROM Years_Table CROSS JOIN Months_Table) AS AUX_TABLE ON AUX_TABLE.year_value = YEAR(Added_Date) AND AUX_TABLE.month_value = MONTH(Added_Date)
GROUP BY month_value, year_value
ORDER BY year_value, month_value
Note: I ommited your CAST instruction since Year(added_date) should be a numeric, asuming your field added_date is a datetime field, on the contrary you should perform a different join.
Postgresql getting all months of year and null values
You can use generate_series()
to generate the months for a year. Then you can join in the values from the aggregation using left join
. Something like this:
select mon.mon, coalesce(s.sales, 0) as sales
from generate_series('2015-01-01'::timestamp, '2015-12-01'::timestamp, interval '1 month'
) as mon(mon) left join
(select date_trunc('Month', dateofmonth) as mon,
sum(sales) as SALES
from SALES_TABLE
group by mon
) s
on mon.mon = s.mon;
Include missing months in Group By query
This solution doesn't require you to hard-code the list of months you might want, all you need to do is provide any start date and any end date, and it will calculate the month boundaries for you. It includes year in the output so that it will support more than 12 months and so that your start and end dates can cross a year boundary and still order correctly and show the correct month and year.
DECLARE @StartDate SMALLDATETIME, @EndDate SMALLDATETIME;
SELECT @StartDate = '20120101', @EndDate = '20120630';
;WITH d(d) AS
(
SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, @StartDate), 0))
FROM ( SELECT TOP (DATEDIFF(MONTH, @StartDate, @EndDate) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id]) - 1
FROM sys.all_objects ORDER BY [object_id] ) AS n
)
SELECT
[Month] = DATENAME(MONTH, d.d),
[Year] = YEAR(d.d),
OrderCount = COUNT(o.OrderNumber)
FROM d LEFT OUTER JOIN dbo.OrderTable AS o
ON o.OrderDate >= d.d
AND o.OrderDate < DATEADD(MONTH, 1, d.d)
GROUP BY d.d
ORDER BY d.d;
Group query results by month and year in postgresql
select to_char(date,'Mon') as mon,
extract(year from date) as yyyy,
sum("Sales") as "Sales"
from yourtable
group by 1,2
At the request of Radu, I will explain that query:
to_char(date,'Mon') as mon,
: converts the "date" attribute into the defined format of the short form of month.
extract(year from date) as yyyy
: Postgresql's "extract" function is used to extract the YYYY year from the "date" attribute.
sum("Sales") as "Sales"
: The SUM() function adds up all the "Sales" values, and supplies a case-sensitive alias, with the case sensitivity maintained by using double-quotes.
group by 1,2
: The GROUP BY function must contain all columns from the SELECT list that are not part of the aggregate (aka, all columns not inside SUM/AVG/MIN/MAX etc functions). This tells the query that the SUM() should be applied for each unique combination of columns, which in this case are the month and year columns. The "1,2" part is a shorthand instead of using the column aliases, though it is probably best to use the full "to_char(...)" and "extract(...)" expressions for readability.
How to return rows with 0 count for missing data?
You can create the list of all first days of the last year (say) with
select distinct date_trunc('month', (current_date - offs)) as date
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01
Then you can join with that series.
PostgreSQL generating missing records and group them with source table
You can use LEFT JOIN and COALESCE
SELECT
d."Date",
coalesce(s."PowerOn", bigint '0') AS "PowerOn",
coalesce(s."Idle", bigint '0') AS "Idle",
coalesce(s."Run", bigint '0') AS "Run",
CONCAT_WS('%', ROUND(NULLIF(coalesce(s."Run", bigint '0')::numeric, 0) / NULLIF(coalesce(s."PowerOn", bigint '0')::numeric, 0) * 100, 2), '') As "Effectivity"
FROM (
SELECT generate_series(timestamp '2021-08-01 00:00:00'
, NOW()
, interval '1 day')::timestamp
) d
LEFT JOIN "Absolute_OEE" s ON d."Date"= s."Date"
AND s."Machine" = 'Machine01'
AND s."Date" > '2021-08-01 00:00:00'
GROUP BY d."Date",
coalesce(s."PowerOn", bigint '0'),
coalesce(s."Idle", bigint '0'),
coalesce(s."Run", bigint '0')
ORDER BY d."Date"
Related Topics
Sql: Parse Comma-Delimited String and Use as Join
Ssms: How to Import (Copy/Paste) Data from Excel
Deleting Duplicate Records Using a Temporary Table
Postgres: Convert Single Row to Multiple Rows (Unpivot)
Renaming a Column Without Breaking the Scripts and Stored Procedures
Calculate Difference Between Start_Time and End_Time in Seconds from Unix_Time Yyyy-Mm-Dd Hh:Mm:Ss
Delete Duplicate Record from Same Table in MySQL
Adding a Column to All User Tables in T-Sql
Inserting New Columns in the Middle of a Table
Calculating How Many Days Are Between Two Dates in Db2
Is This Normalization Correct? (Two Many-To-Manys Connected by a Many-To-One)
Sql- Union All a Large Number of Tables
Using Table Just After Creating It: Object Does Not Exist