Eliminating Duplicate Values Based on Only One Column of the Table

Eliminating duplicate values based on only one column of the table

This is where the window function row_number() comes in handy:

SELECT s.siteName, s.siteIP, h.date
FROM sites s INNER JOIN
(select h.*, row_number() over (partition by siteName order by date desc) as seqnum
from history h
) h
ON s.siteName = h.siteName and seqnum = 1
ORDER BY s.siteName, h.date

Remove duplicates in Select query based on one column

You can also use ROW_NUMBER():

SELECT id, name
FROM (
SELECT id, name, ROW_NUMBER() OVER(PARTITION BY id ORDER BY name) rn
FROM mytable
) x
WHERE rn = 1

This will retain the record that has the smallest name (so '5d' will come before '5e'). With this technique, you can also use a sort criteria on another column that the one where duplicates exists (which an aggregate query with MIN() cannot do). Also, queries using window functions usually perform better than the equivalent aggregate query.

Remove duplicate values based on only one column of the table

Step#1 use table aliasses:

SELECT DISTINCT ta.ArtCode
, ta.ArtOms
, ldet.AldArtCodeBijLeverancier
, ldet.AldInkoopPrijs
, vrd.MagVoorraad
, vrd.LocNaam
, mut.VrdMutDatum
, naw.NawFilNummer
FROM KingSystem.tabArtikel ta
, KingSystem.tabArtikelLeverancier tal
, KingSystem.tabArtikelLeverancierDetail ldet
, KingSystem.tabNawFile naw
, KingSystem.tabVoorraadMutatie mut
, KingSystem.vwKMBVoorraad vrd
WHERE vrd.ArtGid = ta.ArtGid
AND tal.ArtLevArtGid = ta.ArtGid
AND ldet.AldArtLevGid = tal.ArtLevGid
AND mut.VrdMutArtGid = ta.ArtGid
AND naw.NawFilNawGid = tal.ArtLevNawGid
AND vrd.MagVoorraad>0 AND (ta.ArtCode Not Like 'V%'
-- ORDER BY mut.VrdMutDatum DESC -- nonsens!
;

Step#2 : use JOIN-syntax (and remove the distinct):


SELECT -- DISTINCT
ta.ArtCode
, ta.ArtOms
, ldet.AldArtCodeBijLeverancier
, ldet.AldInkoopPrijs
, vrd.MagVoorraad
, vrd.LocNaam
, mut.VrdMutDatum
, naw.NawFilNummer
FROM KingSystem.tabArtikel ta
JOIN KingSystem.vwKMBVoorraad vrd ON vrd.ArtGid = ta.ArtGid
JOIN KingSystem.tabArtikelLeverancier tal ON tal.ArtLevArtGid = ta.ArtGid
JOIN KingSystem.tabArtikelLeverancierDetail ldet ON ldet.AldArtLevGid = tal.ArtLevGid
JOIN KingSystem.tabNawFile naw ON naw.NawFilNawGid = tal.ArtLevNawGid
JOIN KingSystem.tabVoorraadMutatie mut ON mut.VrdMutArtGid = ta.ArtGid
WHERE vrd.MagVoorraad>0 AND ta.ArtCode Not Like 'V%'
-- ORDER BY mut.VrdMutDatum DESC -- nonsens!
;

step#3 : add condition to suppress the old mutations:


SELECT -- DISTINCT
ta.ArtCode
, ta.ArtOms
, ldet.AldArtCodeBijLeverancier
, ldet.AldInkoopPrijs
, vrd.MagVoorraad
, vrd.LocNaam
, mut.VrdMutDatum
, naw.NawFilNummer
FROM KingSystem.tabArtikel ta
JOIN KingSystem.vwKMBVoorraad vrd
ON vrd.ArtGid = ta.ArtGid
JOIN KingSystem.tabArtikelLeverancier tal
ON tal.ArtLevArtGid = ta.ArtGid
JOIN KingSystem.tabArtikelLeverancierDetail ldet
ON ldet.AldArtLevGid = tal.ArtLevGid
JOIN KingSystem.tabNawFile naw
ON naw.NawFilNawGid = tal.ArtLevNawGid
JOIN KingSystem.tabVoorraadMutatie mut
ON mut.VrdMutArtGid = ta.ArtGid
AND NOT EXISTS( -- suppress older mutations; keeping only the most recent
SELECT * FROM KingSystem.tabVoorraadMutatie mx
WHERE mx.VrdMutArtGid = mut.VrdMutArtGid
AND mx.VrdMutDatum > mut.VrdMutDatum
)
WHERE vrd.MagVoorraad > 0 AND ta.ArtCode Not Like 'V%'
-- ORDER BY mut.VrdMutDatum DESC -- nonsens!
;

Voila! (that'll be approx 50 euro)


Step#4: reorder the columns, and add an order by clause


UPDATE: I removed the DISTINCT It servers no purpose, and can only hide possible errors (and introduce other,more serious errors)

remove duplicate values of only one column value from all the available columns in sql query

This will return 1 row for each value of beam_current:

;WITH CTE AS
(
SELECT
row_number() over (partition by beam_current order by (select 1)) rn,
beam_current, logtime, beam_energy
FROM INDUS2_BDS.dbo.DCCT
WHERE
logtime between '2014-08-09 01:13:03' and '2014-08-09 02:16:53'
and (beam_current like '%9.96' or beam_current like '%9.97'
or beam_current like '%9.98' or beam_current like '%9.99'
or beam_current like '%0' or beam_current like '%_0.01'
or beam_current like '%_0.02' or beam_current like '%_0.03'
or beam_current like '%_0.04' or beam_current like '%_0.05'
or beam_current like '%_0.06')
and beam_energy between 550 and 552
)
SELECT beam_current, logtime, beam_energy
FROM CTE
WHERE rn = 1

Remove duplicate rows based on values from one column

You can create a temporary table. In the below example this is called #newtable. The hashtag is important as this is actually what makes it a 'temporary' table (not everyone explains this).

The below might prove useful to others as it includes WHERE conditions which most examples do not have online:

-- First create your temp table 
SELECT CONVERT(DATE,a.ins_timestamp) AS 'Date',
a.Prod_code,
a.Curr_boxes,
a.Label_barcode,
b.From_ord_no,
NULL AS To_ord_no,
CASE
WHEN a.From_batch >= a.To_batch THEN a.From_batch
WHEN a.To_batch >= a.From_batch THEN a.To_batch
ELSE a.From_batch
END AS 'Batch',
a.Weight,
'IN' AS 'Direction'

INTO #newtable

FROM a

JOIN b ON a.Label_barcode = b.Label_barcode

WHERE (a.ins_timestamp Between ? And ?) AND (a.To_batch = ?) AND (a.From_batch = 0) AND (a.Type='Consumption') AND (a.To_status<>'STOCK') AND (b.From_status = 'PORDER')

-- Now we insert the second query into the already created table
INSERT INTO #newtable

SELECT CONVERT(DATE,b.ins_timestamp) AS 'Date',
b.Prod_code,
b.Curr_boxes,
b.Label_barcode,
NULL AS From_ord_no,
NULL AS To_ord_no,
CASE
WHEN b.From_batch >= b.To_batch THEN b.From_batch
WHEN b.To_batch >= b.From_batch THEN b.To_batch
ELSE b.From_batch
END AS 'Batch',
b.Weight,
'IN' AS 'Direction'

FROM b

WHERE (b.From_batch = 0) AND (b.Type='Consumption') AND (b.ins_timestamp Between ? And ?) AND (b.To_batch = ?) AND (b.To_status<>'STOCK')

-- Now we can select whatever we want from our temp table
SELECT Date,
Prod_code,
Curr_boxes,
Label_barcode,
max(From_ord_no) From_ord_no,
To_ord_no,
Batch,
Weight,
Direction

FROM #newtable

GROUP BY Date,
Prod_code,
Curr_boxes,
Label_barcode,
To_ord_no,
Batch,
Weight,
Direction

remove duplicate rows based on one column value

This is similar to Gordon Linoff's query, but without the subquery:

DELETE t1 FROM table t1
JOIN table t2
ON t2.refID = t1.refID
AND t2.ID < t1.ID

This uses an inner join to only delete rows where there is another row with the same refID but lower ID.

The benefit of avoiding a subquery is being able to utilize an index for the search. This query should perform well with a multi-column index on refID + ID.

How to delete rows that have duplicate column combination

You can try to use delete JOIN

DELETE t1
FROM [Table] t1
INNER JOIN (
SELECT Column1,
Column2,
max(RefDate) as MaxDate
FROM [Table]
GROUP BY Column1, Column2
) t2
ON t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
AND t1.RefDate <> t2.MaxDate

or use EXISTS subquery.

DELETE t1
FROM [Table] t1
WHERE EXISTS (
SELECT 1
FROM [Table] t2
WHERE t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
HAVING max(t2.RefDate) <> t1.RefDate
)

sqlfiddle



Related Topics



Leave a reply



Submit