mysql GROUP_CONCAT duplicates
You need to use the DISTINCT
option:
GROUP_CONCAT(DISTINCT animal)
How to suppress duplicates in GROUP_CONCAT MySQL
Your goal is a bit different than that of your linked questions. You only want to remove the duplicate surname, if two persons with the same address have got the same surname.
You can get this with a two step process:
- First we concat the titles and both fornames by groups of address and surname
- In the outer query we concat the first part and the surnames by groups of addresses only
- Addition: we replace the double blanks for persons without a second forename too.
You can use this query:
SELECT
REPLACE(GROUP_CONCAT(
CONCAT_WS(' ', t.name, t.surname)
SEPARATOR ' & '
), ' ', ' ') AS Salutation,
addresses.the_address
FROM (
SELECT
GROUP_CONCAT(
CONCAT_WS(' ', title, forname_1, forename_2)
SEPARATOR ' & '
) name,
surname,
address_id
FROM
people
GROUP BY
address_id, surname
) t
INNER JOIN
addresses
ON
addresses.address_id = t.address_id
GROUP BY
t.address_id;
See it working in this demo
This solution takes into account that a third person with a different surname can live at the same address, i.e. a mother-in-law.
Explanation
The inner query
SELECT
GROUP_CONCAT(
CONCAT_WS(' ', title, forname_1, forename_2)
SEPARATOR ' & '
) name,
surname,
address_id
FROM
people
GROUP BY
address_id, surname
concats only the title and the given names of the persons living at the same address and having the same surname (see second query in the demo).
Then we do the same group_concat as you have done before and use in the last step the REPLACE
function to eliminate double blanks.
Group_Concat with duplicates
Just join the table with itself, using invoice_number as key.
SELECT
invoices.item_id,
invoices.invoice_number,
GROUP_CONCAT(related.item_id) as shared_item_ids,
ROUND(AVG(amount)/100,2) as invoice_amount,
currency
FROM invoices
JOIN invoices related ON related.invoice_number = invoices.invoice_number
GROUP BY invoices.item_id
MySQL - GROUP_CONCAT returns duplicate data, can't use DISTINCT
You can resolve this by extracting the tag
grouping to its own subquery:
SELECT
recipe.*,
GROUP_CONCAT(recipe_detail.ingredient_id) AS iid,
GROUP_CONCAT(ingredient.name) AS iname,
GROUP_CONCAT(ingredient_mfr.abbr) AS mabbr,
(
SELECT GROUP_CONCAT(recipe_tag.name)
FROM recipe_tag
INNER JOIN recipe_tagmap
ON recipe_tagmap.tag_id = recipe_tag.id
WHERE recipe_tagmap.recipe_id = recipe.id
) AS tag
FROM recipe
LEFT JOIN recipe_detail
ON recipe.id = recipe_detail.recipe_id
LEFT JOIN ingredient
ON recipe_detail.ingredient_id = ingredient.id
LEFT JOIN ingredient_mfr
ON ingredient.mfr_id = ingredient_mfr.id
WHERE recipe.user_id = 1
GROUP BY recipe.id
(example fiddle)
MySQL - GROUP_CONCAT eliminate duplicates when joining multiple tables
Instead of joining three tables a good solution is to use scalar subqueries. For example:
select
*,
(select group_concat(p.id order by p.price)
from prices p where p.category_id = c.id) as PriceId,
(select group_concat(p.price order by p.price)
from prices p where p.category_id = c.id) as PricePrice,
(select group_concat(v.id order by v.id)
from videos v where v.category_id = c.id) as VideoId,
(select group_concat(v.uuid order by v.id)
from videos v where v.category_id = c.id) as VideoUUID
from categories c
group by id
Result:
id token PriceId PricePrice VideoId VideoUUID
--- -------------------- ----------- -------------------- ------------ -----------------------------------
1 Wyatt Reinger (ZW) 2,1,3 2.51,2.61,4.45 1,2,3,4 3a817d01,3222679e,63cdc038,e8d8edf4
2 Donna Cronin (BL) 4 4.76 5 93f8a404
3 Ally Kertzmann (GY) 5,6 1.83,1.84 6,7,8 6f2459a7,463127ab,4bf357ba
4 Talia Torp (AF) 7,8 2.61,3.32 9,10,11,12 0cedbd0a,8b21afd7,ea616692,ed2b10d7
5 Delphine Lakin (TL) 11,12,9,10 1.65,3.27,3.27,3.36 13,14,15,16 6217a488,7f52a97a,de11ba64,b49b6ddc
See running example at SQL Fiddle.
The problem of joining three tables is that it produces many duplicate values that complicate the aggregation.
MySQL Trigger Nested Group_Concat Duplicates
Change the second Select to
-- Table1
CREATE TABLE Table1 (id INT,selection1 varchar(50),selection2 varchar(50),results varchar(1000));
INSERT INTO Table1 (id,selection1,selection2,results)
VALUES
(1,null,null,null),
(2,null,null,null),
(3,null,null,null);
select * from Table1
✓
✓
id | selection1 | selection2 | results
-: | :--------- | :--------- | :------
1 | null | null | null
2 | null | null | null
3 | null | null | null
-- Table2
CREATE TABLE Table2 (stocks1 VARCHAR(50),stocks2 VARCHAR(50));
INSERT INTO Table2 (stocks1,stocks2)
VALUES
-- ('BRK','BRK'),
-- ('BRK','BRK');
('BRK','GOOG'),
('INTC','NKE'),
('TSLA','APPL'),
('APPL','NKE'),
('TSLA','FB'),
('NKE','BRK');
select * from Table2
✓
✓
stocks1 | stocks2
:------ | :------
BRK | GOOG
INTC | NKE
TSLA | APPL
APPL | NKE
TSLA | FB
NKE | BRK
CREATE TRIGGER trigger1
BEFORE UPDATE
ON Table1
FOR EACH ROW
BEGIN
set NEW.results =
(SELECT CONCAT('<br>',GROUP_CONCAT( val
SEPARATOR '<br>'))
FROM (SELECT stocks1 as val FROM
Table2
WHERE
NEW.selection2 = 'x'
UNION
SELECT stocks2 FROM
Table2
WHERE
NEW.selection2 = 'x') t1
);
end;
update Table1
set selection1 = 'x', selection2 = 'x'
where id = 1
select * from Table1
id | selection1 | selection2 | results
-: | :--------- | :--------- | :---------------------------------------------------
1 | x | x | <br>BRK<br>INTC<br>TSLA<br>APPL<br>NKE<br>GOOG<br>FB
2 | null | null | null
3 | null | null | null
db<>fiddle here
Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs
Your second query is of the form:
q1 -- PK user_id
LEFT JOIN (...
GROUP BY user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN (...
GROUP BY user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY -- group_concats
The inner GROUP BYs result in (user_id, t.tag)
& (user_id, c.category)
being keys/UNIQUEs. Other than that I won't address those GROUP BYs.
TL;DR When you join (q1 JOIN q2) to q3 it is not on a key/UNIQUE of one of them so for each user_id you get a row for every possible combination of tag & category. So the final GROUP BY inputs duplicates per (user_id, tag) & per (user_id, category) and inappropriately GROUP_CONCATs duplicate tags & categories per user_id. Correct would be (q1 JOIN q2 GROUP BY) JOIN (q1 JOIN q3 GROUP BY) in which all joins are on common key/UNIQUE (user_id)
& there is no spurious aggregation. Although sometimes you can undo such spurious aggregation.
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.
A correct cumulative LEFT JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then LEFT JOIN that & q3--1:many--then GROUP BY & GROUP_CONCAT.
A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--many:1:many. It gives a row for every possible combination of a tag & a category that appear with a user_id. Then after you GROUP BY you GROUP_CONCAT--over duplicate (user_id, tag) pairs and duplicate (user_id, category) pairs. That is why you have duplicate list elements. But adding DISTINCT to GROUP_CONCAT gives a correct result. (Per wchiquito's comment.)
Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the many:1:many JOIN approach offset its saving of a GROUP BY.
-- cumulative LEFT JOIN approach
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
-- your 1st query (less ORDER BY) AS q1
(SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
) AS q1
-- finish like your 2nd query
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;
Eliminate duplicate rows inside GROUP_CONCAT
Since you're only getting the information for a single user, I would go with Salman A's answer; but if you were going for multiple (or all) users, and users tend to have lots of emails and numbers, this version could be faster.
SELECT emp.id, first_name, second_name, last_name, department, positions, ee.mails, en.numbers
FROM geography.employee AS emp
INNER JOIN (SELECT email_FK, GROUP_CONCAT(email, " ", email_type SEPARATOR " || ") AS mails
FROM geography.employee_email
GROUP BY email_FK
) AS ee ON emp.id = ee.email_FK
LEFT JOIN (SELECT number_FK, GROUP_CONCAT(number, " ", number_type SEPARATOR " || ") AS numbers
FROM geography.employee_number
GROUP BY number_FK
) AS en ON emp.id = en.number_FK
;
Related Topics
Why Are Batch Inserts/Updates Faster? How Do Batch Updates Work
Using Bind Variables with Dynamic Select into Clause in Pl/Sql
Determine What User Created Objects in SQL Server
Hql: How to Perform an Inner Join on a Subquery
Rowset Does Not Support Scrolling Backward
Get Multiple Values in SQL Server Cursor
How to Execute a Ms SQL Server Stored Procedure in Java/Jsp, Returning Table Data
How to Implement a Keyword Search in MySQL
Performance of Like '%Query%' VS Full Text Search Contains Query
Getting Warning: Null Value Is Eliminated by an Aggregate or Other Set Operation
How to Measure the Execution Time of a Query on Spark
Multiple Column Foreign Key Contraints