Get the Distinct Sum of a Joined Table Column

Get the distinct sum of a joined table column

To get the result without subquery, you have to resort to advanced window function trickery:

SELECT sum(count(*))       OVER () AS tickets_count
, sum(min(a.revenue)) OVER () AS atendees_revenue
FROM tickets t
JOIN attendees a ON a.id = t.attendee_id
GROUP BY t.attendee_id
LIMIT 1;

sqlfiddle

How does it work?

The key to understanding this is the sequence of events in the query:

aggregate functions -> window functions -> DISTINCT -> LIMIT

More details:

  • Best way to get result count before LIMIT was applied

Step by step:

  1. I GROUP BY t.attendee_id - which you would normally do in a subquery.

  2. Then I sum over the counts to get the total count of tickets. Not very efficient, but forced by your requirement. The aggregate function count(*) is wrapped in the window function sum( ... ) OVER () to arrive at the not-so-common expression: sum(count(*)) OVER ().

    And sum the minimum revenue per attendee to get the sum without duplicates.

    You could also use max() or avg() instead of min() to the same effect as revenue is guaranteed to be the same for every row per attendee.

    This could be simpler if DISTINCT was allowed in window functions, but PostgreSQL has not (yet) implemented this feature. Per documentation:

    Aggregate window functions, unlike normal aggregate functions, do not
    allow DISTINCT or ORDER BY to be used within the function argument list.

  3. Final step is to get a single row. This could be done with DISTINCT (SQL standard) since all rows are the same. LIMIT 1 will be faster, though. Or the SQL-standard form FETCH FIRST 1 ROWS ONLY.

SQL SUM values by DISTINCT column after JOIN

If there is no donation information of a city then donation column will show null. To convert it to 0 you can use coalesce(SUM(d.amount),0).

Schema and insert statements:

create table members(id int,   name varchar(50),    city varchar(50));
insert into members values(1, 'John' ,'Boston');
insert into members values(2, 'Maria' ,'Boston');
insert into members values(3, 'Steve' ,'London');
insert into members values(4, 'Oscar' ,'London');
insert into members values(5, 'Ben' ,'Singapore');

create table donations(member_id int, amount int);
insert into donations values(1, 100);
insert into donations values(1, 150);
insert into donations values(2, 300 );
insert into donations values(3, 50);
insert into donations values(3, 100);
insert into donations values(3, 50);
insert into donations values(4, 75);
insert into donations values(5, 200);

Query:

SELECT m.city, SUM(d.amount) as donations
FROM members m LEFT JOIN
donations d
ON d.member_id = m.id
GROUP BY m.city
ORDER BY city;

Output:























citytotal_donations
Boston550
London275
Singapore200

Join tables then sum on distinct values


use AdventureWorks2012
select
datename(dw,orderdate ) as "Day",
SumLineTotal=SUM(LineTotal),
SumOrderQty=SUM(OrderQty)
from
sales.SalesOrderDetail
INNER join sales.SalesOrderHeader on (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID)
group by
datename(dw,orderdate)

Getting sum of a column that needs a distinct value from other column

I guess this is a job for a subquery. So let's take your problem step by step.

I'm trying to find all the rows in the balance column that are the same and have the same date,

This subquery gets you that, I believe. It give the same result as SELECT DISTINCT but it also counts the duplicated rows.

                SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance

and then find the sum of the balance column.

Nest the subquery like this.

SELECT SUM(balance) summed_balance, date
FROM (
SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance
) subquery
GROUP BY date

If you only want to consider rows that actually have duplicates, change your subquery to

                SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance
HAVING COUNT(*) >= 1

Be careful here, though. You didn't tell us what you want to do, only how you want to do it. The way you described your problem calls for discarding duplicated data before doing the sums. Is that right? Do you want to discard data?

SQL INNER JOIN of sum distinct values

Well first of all your sample data was wrong. So first lets create the right structure.

And please keep in mind, in future if you share sql scripts to create your sample data, you will get more answers.

-- declare tables
DECLARE @singers TABLE (singerID INT, name NVARCHAR(255), country NVARCHAR(255));
DECLARE @musics TABLE (musicID INT, singerID INT, songName NVARCHAR(255));
DECLARE @playlistInfo TABLE (singerID INT, musicID INT, listened INT);

-- Load sample data
INSERT INTO @singers VALUES (1, 'Adele', 'England'), (2, 'Mozart', 'Austria'), (3, 'DuaLipa', 'England')
INSERT INTO @musics VALUES (1, 1, 'Rolling in the Deep'), (2, 2, 'Symphony No 40'), (3, 3, 'One Kiss')
INSERT INTO @playlistInfo VALUES (1, 1, 25), (2, 2, 15), (3, 3, 20)

And then query our tables for the top 10 singers from England.

SELECT TOP 10
s.name as Singer, ISNULL(SUM(pl.listened), 0) as TotalListened
FROM
@singers s
LEFT JOIN @musics m ON m.singerID = s.singerID
LEFT JOIN @playlistInfo pl ON pl.musicID = m.musicID AND pl.singerID = m.singerID
-- I did left join to show anyone with 0 listen too, you can convert it to `JOIN`
WHERE
s.country = 'England'
GROUP BY
s.name
ORDER BY
SUM(pl.listened) DESC

Some little extra (if you want to get most listened song)

-- get the most listened song within country
SELECT TOP 10
s.name as Singer, m.songName, ISNULL(SUM(pl.listened), 0) as TotalListened
FROM
@singers s
LEFT JOIN @musics m ON m.singerID = s.singerID
LEFT JOIN @playlistInfo pl ON pl.musicID = m.musicID AND pl.singerID = m.singerID
WHERE
s.country = 'England'
GROUP BY
s.name,
m.songName
ORDER BY
SUM(pl.listened) DESC

Get distinct values and sum their respective quantities

You can use group by:

select ItemNo, sum(Qty) as QtyTotal
from QueryOutput q
group by ItemNo;

You can replace QueryOutput with a query that produces your example table.

Fiddle

MySql Join Tables With Sum Of A Column

If I'm understanding correctly, I believe the following should work.

The key is to issue the LEFT JOIN so that even a server with no matching record in the server_hit table will still show in the final output, but with a 0 sum.

SELECT s.server_id, s.server_name, s.server_url, s.category_id, c.category_name, IFNULL(SUM(sh.hit_count), 0)
FROM server s
INNER JOIN category c ON s.category_id = c.category_id
LEFT JOIN server_hit sh ON s.server_id = sh.server_id
GROUP BY s.server_id, s.server_name, s.server_url, s.category_id, c.category_name

Add IF EXISTS to handle NULL issue

SELECT DISTINCT s.server_id, s.server_name, s.server_url, s.category_id, c.category_name, IF(EXISTS(SELECT id FROM server_hit WHERE sh.server_id = s.server_id), SUM(sh.hit_count), 0) as 'total_hit_count' 
FROM server s
INNER JOIN category c ON s.category_id = c.category_id
LEFT JOIN server_hit sh ON s.server_id = sh.server_id GROUP BY s.server_id, s.server_name, s.server_url, s.category_id, c.category_name



Related Topics



Leave a reply



Submit