Find Duplicate Rows with Postgresql

How to find duplicate records in PostgreSQL

The basic idea will be using a nested query with count aggregation:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

You can adjust the where clause in the inner query to narrow the search.


There is another good solution for that mentioned in the comments, (but not everyone reads them):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

Or shorter:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

Find duplicate rows with PostgreSQL

Here is my take on it.

select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
FROM Photos
) dups
where
dups.Row > 1

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

How to find duplicate records and update using postgresql?

You can use RANK () OVER function like below

which will give you each entry ranked based on last modified date for each userid group.

Then you can write update query to update isactive to false where device_rank ! =1

select id,userid,deviceid,isactive,last_modified,
RANK () OVER (
PARTITION BY userid
ORDER BY last_modified DESC
) device_rank
from deviceTable

duplicate records from postgresql sql query

If you don't want duplicates, the simplest solution is to use select distinct:

SELECT DISTINCT p.id, pp.parameter_id
FROM products a JOIN
products_parameters pp
ON pp.product_id = p.id
WHERE p.category_id = 14 AND pp.parameter_id = 22
ORDER BY p.id;

Your question doesn't have enough information to specify why you are getting duplicates, but presumably because you are only choosing one column from each of the tables and other columns are different.

Note the other changes to the query:

  • The table aliases are meaningful rather than arbitrary letter.
  • The WHERE clause turns the LEFT JOIN into an inner join anyway, so this version properly expresses the JOIN.

postgresql find duplicates in column with ID

Use Count()Over() window aggregate function

Select * from
(
select Id, Firstname, count(1)over(partition by Firstname) as Cnt
from yourtable
)a
Where Cnt > 1

Find rows with all columns duplicated and no unique field in PostgreSQL

Assuming all columns NOT NULL, this will do:

SELECT *
FROM tbl t1
WHERE EXISTS (
SELECT FROM tbl t2
WHERE (t1.*) = (t2.*)
AND t1.ctid <> t2.ctid
);

ctid is a system column, the "tuple identifier" / "item pointer" that can serve as poor-man's PK in the absence of an actual PK (which you obviously don't have), and only within the scope of a single query. Related:

  • Delete duplicate rows from small table
  • How do I decompose ctid into page and row numbers?

If columns can be NULL, (more expensively) operate with IS NOT DISTINCT FROM instead of =. See:

  • How do I (or can I) SELECT DISTINCT on multiple columns?

(t1.*) = (t2.*) is comparing ROW values. This shorter syntax is equivalent: t1 = t2 unless a column of the same name exists in the underlying tables, in which case the second form fails while the first won't. See:

  • SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'

Index?

If any of the columns has a particularly high cardinality (many distinct values, few duplicates), let's call it hi_cardi_column for the purpose of this answer, a plain btree index on just that column can be efficient for your task. A combination of a few, small columns with a multicolumn index can work, too. The point is to have a small, fast index or the overhead won't pay.

SELECT *
FROM tbl t1
WHERE EXISTS (
SELECT FROM tbl t2
WHERE t1.hi_cardi_column = t2.hi_cardi_column -- logically redundant
AND (t1.*) = (t2.*)
AND t1.ctid <> t2.ctid
);

The added condition t1.hi_cardi_column = t2.hi_cardi_column is logically redundant, but helps to utilize said index.

Other than that I don't see much potential for index support as all rows of the table have to be visited anyway, and all columns have to be checked.

Find SQL duplicate with specific condition

Another way :

select tn.id,tn.address_id,tn.state
from tableName tn
inner join (select max(id) as id ,count(address_id) as nr_count
from tableName
where state='A'
group by address_id
) as t1 on tn.id=t1.id
where t1.nr_count >1;

Demo

You could use window function:

select max(id) as id ,address_id,state
from (
SELECT id, address_id,state
, count(*) OVER ( PARTITION BY address_id ) AS cnt
FROM tableName
where state='A'
) as t1
where cnt>1
group by address_id,state;

Demo

select all duplicates from a table in postgres

Something like this might be what you're looking for.

  SELECT columns_that_define_duplicates -- SELECT item_id, name, slug perhaps?
, count(*)
FROM eve_online_market_groups
GROUP BY columns_that_define_duplicates
HAVING count(*) > 1;


Related Topics



Leave a reply



Submit