Postgres: Select All Row with Count of a Field Greater Than 1

Postgres: select all row with count of a field greater than 1

SELECT * 
FROM product_price_info
WHERE name IN (SELECT name
FROM product_price_info
GROUP BY name HAVING COUNT(*) > 1)

Postgres - select rows except where count is more than one and another condition

As it was suggested in the comment, you can use not exists:

select * 
from my_table mt
where mt.appuser_id = 6
and mt.role_id = 'BUSINESS_MANAGER'
and mt.business_id not in (
select imt.business_id
from my_table imt
where exists (
select *
from my_table iimt
where imt.appuser_id <> iimt.appuser_id
and iimt.appuser_id = mt.appuser_id
and iimt.role_id = 'CLIENT_ADMIN'
)
)

Select where count of one field is greater than one

Use the HAVING, not WHERE clause, for aggregate result comparison.

Taking the query at face value:

SELECT * 
FROM db.table
HAVING COUNT(someField) > 1

Ideally, there should be a GROUP BY defined for proper valuation in the HAVING clause, but MySQL does allow hidden columns from the GROUP BY...

Is this in preparation for a unique constraint on someField? Looks like it should be...

How select row where some column has more than 1 distinct value?

I think you could do this;

SELECT Name, COUNT(DISTINCT Family)
FROM [table]
GROUP BY Name
HAVING COUNT(DISTINCT Family) > 1

How to select all rows where a multiple column value occurs more than once in postgresql?

Use window function - count as follows

select * from
(select t.*,
count(1) over (partition by product_id, entry_Date) as cnt
from t) t
where cnt > 1

You can also use exists as follows:

select * from t
where exists
(select 1 from t tt
where t.product_id = tt.product_id
and t.entry_Date = tt.entry_date
and t.id <> tt.id)

How do you find the row count for all your tables in Postgres

There's three ways to get this sort of count, each with their own tradeoffs.

If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.

WITH tbl AS
(SELECT table_schema,
TABLE_NAME
FROM information_schema.tables
WHERE TABLE_NAME not like 'pg_%'
AND table_schema in ('public'))
SELECT table_schema,
TABLE_NAME,
(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

The second approach notes that the statistics collector tracks roughly how many rows are "live" (not deleted or obsoleted by later updates) at any time. This value can be off by a bit under heavy activity, but is generally a good estimate:

SELECT schemaname,relname,n_live_tup 
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

That can also show you how many rows are dead, which is itself an interesting number to monitor.

The third way is to note that the system ANALYZE command, which is executed by the autovacuum process regularly as of PostgreSQL 8.3 to update table statistics, also computes a row estimate. You can grab that one like this:

SELECT 
nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema') AND
relkind='r'
ORDER BY reltuples DESC;

Which of these queries is better to use is hard to say. Normally I make that decision based on whether there's more useful information I also want to use inside of pg_class or inside of pg_stat_user_tables. For basic counting purposes just to see how big things are in general, either should be accurate enough.

Is COUNT(1) or COUNT(*) better for PostgreSQL

Benchmarking the difference

The last time I've benchmarked the difference between COUNT(*) and COUNT(1) for PostgreSQL 11.3, I've found that COUNT(*) was about 10% faster. The explanation by Vik Fearing at the time has been that the constant expression 1 (or at least its nullability) is being evaluated for the entire count loop. I haven't checked whether this has been fixed in PostgreSQL 14.

Don't worry about this in real world queries

However, you shouldn't worry about such a performance difference. The difference of 10% was measurable in a benchmark, but I doubt you can consistently measure such a difference in an ordinary query. Also, ideally, all SQL vendors optimise the two things in the same way, given that 1 is a constant expression, and thus can be eliminated. As mentioned in the above article, I couldn't find any difference in any other RDBMS that I've tested (MySQL, Oracle, SQL Server), and I wouldn't expect there to be any difference.

PostgreSQL - select count(*) for rows where a condition holds

If I understand correctly, you want pairs of attributes whose ratings are always "1".

This should give you the attributes:

select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
min(rating_id) as minri, max(rating_id) as maxri
from compatibility c
group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
having min(rating_id) = 1 and max(rating_id) = 1;

To get the count, just use this as a subquery:

select count(*)
from (select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
min(rating_id) as minri, max(rating_id) as maxri
from compatibility c
group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
having min(rating_id) = 1 and max(rating_id) = 1
) c


Related Topics



Leave a reply



Submit