Strange Duplicate Behavior from Group_Concat of Two Left Joins of Group_Bys

Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

Your second query is of the form:

q1 -- PK user_id
LEFT JOIN (...
GROUP BY user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN (...
GROUP BY user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY -- group_concats

The inner GROUP BYs result in (user_id, t.tag) & (user_id, c.category) being keys/UNIQUEs. Other than that I won't address those GROUP BYs.

TL;DR When you join (q1 JOIN q2) to q3 it is not on a key/UNIQUE of one of them so for each user_id you get a row for every possible combination of tag & category. So the final GROUP BY inputs duplicates per (user_id, tag) & per (user_id, category) and inappropriately GROUP_CONCATs duplicate tags & categories per user_id. Correct would be (q1 JOIN q2 GROUP BY) JOIN (q1 JOIN q3 GROUP BY) in which all joins are on common key/UNIQUE (user_id) & there is no spurious aggregation. Although sometimes you can undo such spurious aggregation.

A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.

A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.

A correct cumulative LEFT JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then LEFT JOIN that & q3--1:many--then GROUP BY & GROUP_CONCAT.

A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--many:1:many. It gives a row for every possible combination of a tag & a category that appear with a user_id. Then after you GROUP BY you GROUP_CONCAT--over duplicate (user_id, tag) pairs and duplicate (user_id, category) pairs. That is why you have duplicate list elements. But adding DISTINCT to GROUP_CONCAT gives a correct result. (Per wchiquito's comment.)

Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the many:1:many JOIN approach offset its saving of a GROUP BY.

-- cumulative LEFT JOIN approach
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
-- your 1st query (less ORDER BY) AS q1
(SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
) AS q1
-- finish like your 2nd query
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;

GROUP_CONCAT being weird

The simplest solution is to use distinct:

SELECT u.Name AS Name, u.ID AS ID,
GROUP_CONCAT(DISTINCT e.Address SEPARATOR ',') AS Address,
GROUP_CONCAT(DISTINCT e.Username SEPARATOR ',') AS eUsername,
GROUP_CONCAT(DISTINCT t.Number SEPARATOR ',') AS Number,
GROUP_CONCAT(DISTINCT t.Username SEPARATOR ',') AS tUsername
FROM mitglied u INNER JOIN
email e
ON u.ID = e.ID INNER JOIN
telephone t
ON u.ID = t.ID
WHERE u.Name REGEXP '$searchterm'
GROUP BY u.ID, u.name;

Notes:

  • I assume that mitglied is users.
  • Table aliases make the query easier to write and to read.
  • Qualified column names are recommended whenever you have multiple table references.

SQL duplicate values of records with multiple joins

You want to join invoice totals with esitimated totals. Both are aggregations. So, make these aggregations, then join. With the jobs and accounts thus found, join the job and account tables.

select
j.jobnumber,
a.name as "Account",
inv.total as "Total Invoiced",
est.total as "Total Estimated",
from
(
select e.jobid, ea.accountid, sum(ea.amount) as total
from estimate_accounts ea
join estimates e on e.id = ea.estimateid
group by e.jobid, ea.accountid
) est
full outer join
(
select ji.jobid, ia.accountid, sum(ia.amount) as total
from invoice_accounts ia
join job_invoices ji on ji.id = ia.invoiceid
group by ji.jobid, ia.accountid
) inv using (jobid, accountid)
join jobs j on j.id = jobid
join accounts a on a.id = accountid
order by j.jobnumber, a.name;

If your DBMS doesn't support the USING clause, you must use ON instead:

select
[...]
) inv on inv.jobid = est.jobid and inv.accountid = est.accountid
join jobs j on j.id in (est.jobid, inv.jobid)
join accounts a on a.id in (est.accountid, inv.accountid)
order by j.jobnumber, a.name;

How can I add multiple tables via LEFT JOIN and count the rows of each table?

Left joins are not the right tool for this. You should use subselects:

SELECT p.*,
(SELECT COUNT(*) FROM posts po WHERE p.name = po.name) AS posts,
(SELECT COUNT(*) FROM tasks ta WHERE p.name = ta.name) AS tasks
FROM projects p

Return Data from multiple tables

Calculate separately the 2 sums and join them:

select 
ci.name, ci.address, ci.area,
ci.invoice_amount, cp.invoice_payments_amount
from (
select c.id, c.name, c.address, c.area, sum(i.amount) invoice_amount
from customers c inner join invoices i on c.id = i.customer_id
group by c.id, c.name, c.address, c.area
) ci inner join (
select c.id, sum(p.amount) invoice_payments_amount
from customers c
inner join invoices i on c.id = i.customer_id
inner join invoice_payments p on i.id = p.invoice_id
group by c.id
) cp on cp.id = ci.id

Join 3 Tables in a MySql Query

Here's a solution using correlated sub-queries

SELECT  u.Id,
u.Name,
(SELECT COUNT(Name) FROM cnt WHERE Name = u.name AND type = 'Web' AND Date = '2017-11-09') AS cnt_web,
(SELECT COUNT(Name) FROM lead WHERE Name = u.name AND type = 'Activation' AND Date>= '2017-11-09' AND Date< '2017-11-10') AS cnt_lead
FROM users u
WHERE u.Status = 'Aktiv' AND u.Role = 'Op'

SQL Query with multiple Inner Joins returns wrong Count of Values

Aggregate before joining. You want to join the review count to each listing and then the aggregated listing information to the neighbourhood.

select
lr.mean_price,
n.city,
n.neighbourhood,
lr.amount_listings,
lr.min_price,
lr.max_price,
lr.amount_reviews
from neighborhood n
join
(
select
l.neighbourhood,
min(l.price) as min_price,
max(l.price) as max_price,
avg(l.price) as mean_price,
count(*) as amount_listings,
coalesce(sum(r.reviews_for_listing), 0) as amount_reviews
from listings l
left join
(
select
listings_id,
count(*) as reviews_for_listing
from reviews
group by listings_id
) r on r.listings_id = l.listings_id
where l.host_id in
(
select host_id
from listings
group by host_id
having count(*) < 3
)
group by l.neighbourhood
) lr on lr.neighbourhood = n.neighbourhood
order by n.city, n.neighbourhood;


Related Topics



Leave a reply



Submit