Count Distinct Records (All Columns) Not Working

count distinct records (all columns) not working

select count(*)
from
(
   select distinct * from your_table
) x

Counting DISTINCT over multiple columns

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

Bigquery SELECT * WHEN COUNT(DISTINCT value) does not work

I want to SELECT * where session is unique ...

Use below instead - note use of = in COUNT(*) = 1

SELECT *
FROM `table.id`
WHERE session IN (
    SELECT session
    FROM `table.id`
    GROUP BY session
    HAVING COUNT(*) = 1
)

SELECT COUNT(DISTINCT... ) error on multiple columns?

COUNT() in SQL Server accepts the following syntax

COUNT(*)
COUNT(colName)
COUNT(DISTINCT colName)

You can have a subquery which returns unique set of make and model that you can count with.

SELECT  COUNT(*)
FROM
        (
            SELECT  DISTINCT make, model
            FROM    VehicleModelYear
        ) a

The "a" at the end is not a typo. It's an alias without which SQL will give an error ERROR 1248 (42000): Every derived table must have its own alias.

Why is Distinct * not working but count(Distinct *) is working?

The reason for this is that COUNT(*) is treated differently to COUNT(expr) in SQL. From the MySQL manual:

COUNT(*) is somewhat different in that it returns a count of the
number of rows retrieved, whether or not they contain NULL values.

while COUNT(DISTINCT expr)

Returns a count of the number of rows with different non-NULL expr values.

So if you have rows with NULL values, COUNT(*) will return all rows, as will COUNT(*) FROM (SELECT DISTINCT * ...) (since SELECT DISTINCT * treats rows with NULL values as different from those with non-NULL values), but COUNT(DISTINCT expr) will count only rows with non-NULL values, hence giving a lower result.

The hive manual shows that it behaves the same way.

See this demo on dbfiddle to see this in operation with a table with some rows with NULL values.

Note that COUNT(DISTINCT *) is not legal syntax in any version of MySQL (at least from 5.5 onwards). That may be a hive extension.

Sum / Count Distinct is not returning correctly

You're counting distinct values for ub.Id, but you're also grouping by the same column.

You need to remove ub.Id from the column list and the GROUP BY to get your aggregates correct. You also don't need DISTINCT when using GROUP BY.

SELECT 
    o.Organization_Name,
    ui.DisplayName, 
    ui.NumLogins, 
    COUNT(uw.Title) Assigned,
    SUM(CASE WHEN uw.Status IS NULL THEN 1 ELSE 0 END) NotStarted,
    SUM(CASE WHEN uw.Status = 'incomplete' THEN 1 ELSE 0 END) InProgress,
    SUM(CASE WHEN uw.grade >=80 THEN 1 ELSE 0 END) Completed,
    COUNT(DISTINCT ub.Id) ShouldBe84a,
    SUM(CASE WHEN ub.Id > 0 THEN 1 ELSE 0 END) ShouldBe84b
FROM @Organizations o
INNER JOIN @UserInfo ui on ui.OrgId = o.OrgId
LEFT JOIN @UserWorkshop uw on uw.UserId = ui.UserId
LEFT JOIN @UserBehavior ub on ub.UserId = ui.UserId
WHERE username = 'user@email.com'
AND ub.Id IS NOT NULL
GROUP BY o.Organization_Name,
        ui.DisplayName,
        ui.NumLogins;

How to do count(distinct) for multiple columns

[TL;DR] Just use a sub-query.

If you are trying to use concatenation then you need to ensure that you delimit the terms with a string that is never going to appear in the values otherwise you will find non-distinct terms grouped together.

For example: if you have a two numeric column then using COUNT(DISTINCT col1 || col2) will group together 1||23 and 12||3 and count them as one group.

You could use COUNT(DISTINCT col1 || '-' || col2) but if the columns are string values and you have 'ab-'||'-'||'c' and 'ab'||'-'||'-c' then, once again, they would be identical once concatenated.

The simplest method is to use a sub-query.

If you can't do that then you can combine columns via string-concatenation but you need to analyse the contents of the column and pick a delimiter that does not appear in your strings otherwise your results might be erroneous. Even better is to ensure that the delimiter character will never be in the sub-string with check constraints.

ALTER TABLE mytable ADD CONSTRAINT mytable__col1__chk CHECK (col1 NOT LIKE '%¬%');
ALTER TABLE mytable ADD CONSTRAINT mytable__col2__chk CHECK (col2 NOT LIKE '%¬%');

Then:

SELECT COUNT(DISTINCT col1 || '¬' || col2)
FROM   mytable;

count distinct values in column for all columns in a tables in single query

I assume the most difficult case - when you can't rely on a substring of the table name.

In that case - create your schema/table list, and use it to create a script that builds one big script that you can finally launch once it's generated:

CREATE LOCAL TEMPORARY TABLE srch(table_schema,table_name) 
ON COMMIT PRESERVE ROWS AS
          SELECT 'public','gen_sample'
UNION ALL SELECT 'public','d_product'
UNION ALL SELECT 'dbadmin','d_cust_scd'
UNION ALL SELECT 'dbadmin','currencies'
;

SELECT
    CASE ROW_NUMBER() OVER(ORDER BY c.table_schema,c.table_name,ordinal_position)
    WHEN 1 THEN ''
    ELSE 'UNION ALL '
    END
  ||'SELECT '''||c.table_schema||'.'||c.table_name||'.'||column_name||''','
  ||'COUNT(DISTINCT '||column_name||') FROM '||c.table_schema||'.'||c.table_name
FROM columns c JOIN srch USING(table_schema,table_name);

-- out  SELECT 'dbadmin.currencies.id',COUNT(DISTINCT id) FROM dbadmin.currencies
-- out  UNION ALL SELECT 'dbadmin.currencies.nm',COUNT(DISTINCT nm) FROM dbadmin.currencies
-- out  UNION ALL SELECT 'dbadmin.currencies.sgn',COUNT(DISTINCT sgn) FROM dbadmin.currencies
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_key',COUNT(DISTINCT cust_key) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_id',COUNT(DISTINCT cust_id) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_from_dt',COUNT(DISTINCT cust_from_dt) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_to_dt',COUNT(DISTINCT cust_to_dt) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_is_curr',COUNT(DISTINCT cust_is_curr) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_cre_ts',COUNT(DISTINCT cust_cre_ts) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_udt_ts',COUNT(DISTINCT cust_udt_ts) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_fname',COUNT(DISTINCT cust_fname) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_lname',COUNT(DISTINCT cust_lname) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_phoneno',COUNT(DISTINCT cust_phoneno) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_loy_lvl',COUNT(DISTINCT cust_loy_lvl) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'dbadmin.d_cust_scd.cust_org_id',COUNT(DISTINCT cust_org_id) FROM dbadmin.d_cust_scd
-- out  UNION ALL SELECT 'public.d_product.prdkey',COUNT(DISTINCT prdkey) FROM public.d_product
-- out  UNION ALL SELECT 'public.d_product.prdid',COUNT(DISTINCT prdid) FROM public.d_product
-- out  UNION ALL SELECT 'public.d_product.start_date',COUNT(DISTINCT start_date) FROM public.d_product
-- out  UNION ALL SELECT 'public.d_product.end_date',COUNT(DISTINCT end_date) FROM public.d_product
-- out  UNION ALL SELECT 'public.d_product.price',COUNT(DISTINCT price) FROM public.d_product
-- out  UNION ALL SELECT 'public.gen_sample.srr_key',COUNT(DISTINCT srr_key) FROM public.gen_sample
-- out  UNION ALL SELECT 'public.gen_sample.seq',COUNT(DISTINCT seq) FROM public.gen_sample
-- out  UNION ALL SELECT 'public.gen_sample.nucleotide',COUNT(DISTINCT nucleotide) FROM public.gen_sample
-- out  UNION ALL SELECT 'public.gen_sample.quality',COUNT(DISTINCT quality) FROM public.gen_sample

Count Distinct Records (All Columns) Not Working