Removing Duplicates from a SQL Query (Not Just "Use Distinct")

Removing duplicates from a SQL query (not just use distinct)

Arbitrarily choosing to keep the minimum PIC_ID. Also, avoid using the implicit join syntax.

SELECT U.NAME, MIN(P.PIC_ID)
FROM USERS U
INNER JOIN POSTINGS P1
ON U.EMAIL_ID = P1.EMAIL_ID
INNER JOIN PICTURES P
ON P1.PIC_ID = P.PIC_ID
WHERE P.CAPTION LIKE '%car%'
GROUP BY U.NAME;

How to select records without duplicate on just one field in SQL?

Try this:

SELECT MIN(id) AS id, title
FROM tbl_countries
GROUP BY title

SQL to remove duplicate records without a distinct

Here's one way using a window function to get the count of matching records:

SELECT
columnA,
columnB,
columnC
FROM
(
SELECT
columnA,
columnB,
columnC,
COUNT(*) OVER (PARTITION BY columnA, columnB) as rcount
FROM table
) sub
WHERE
(sub.rcount = 2 AND columnC = 'John')
OR sub.rcount = 1;

ORACLE SQL select distinct not removing duplicates

You misunderstand what distinct is. It is not a function. It is a modifier on select and it affects all columns being selected. So, it is behaving exactly as it should.

If you want aggregations by zip code and week, then those are the only two columns that should be in the group by:

SELECT vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK),
-- vo.REGION_ID
COUNT(vo.ORDER_ID),
SUM(vo.AMOUNT)
FROM VENDOR_ORDERS vo JOIN
CALENDAR ca
ON TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
WHERE vo.REGION_ID = 1
GROUP BY vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK)

You could probably include region_id as well, assuming that each zip code is in one region.

DISTINCT does not remove duplicates

First of all this query works good because the seeria_nr and paigalduse_aeg is different as you can see so DISTINCT cannot filter out them.

You can use GROUP BY to get what you want:

GROUP BY
b.kasutaja_nimi
,b.eesnimi
,b.perenimi
,a.r_nimetus

this will brings to you the result that you execept - but remeber that seeria_nr and paigalduse_aeg will be showing randomly values.

How do I delete all the duplicate records in a MySQL table without temp tables

Add Unique Index on your table:

ALTER IGNORE TABLE `TableA`   
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);

Another way to do this would be:

Add primary key in your table then you can easily remove duplicates from your table using the following query:

DELETE FROM member  
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);

Delete duplicate rows from a BigQuery table

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

A query that should work is here:

SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1

How to remove duplicates in query for google big query by a subset of returned rows, and keep first?

As @Jaytiger has mentioned in the comments, we have to use the ROW_NUMBER() function coupled with PARTITION BY and ORDER BY clauses.

Consider the query below. I have tested the query on sample data and have compared the results with that of a pandas snippet.

SELECT * from
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY column1, column6 ORDER BY columnX) row_num
FROM
`<project-id>.test_dataset.keep_first_in_duplicate`
)
where row_num=1

The usage of the ORDER BY clause depends on the requirement, the requirement being order preservation of the input data. Unlike a pandas dataframe, the order of input data is not preserved in BigQuery. If we wish to preserve the order, we have to have a new column with indices that can be used to sort the data after ingesting into BigQuery. In summary, if your data source follows a certain order, there will be differences between the deduplication output from BigQuery and that of the pandas dataframe.



Related Topics



Leave a reply



Submit