Hive Select Count(*) Non Null Returns Higher Value Than Select Count(*)

HIVE select count(*) non null returns higher value than select count(*)

Most probably your query without where is using statistics because of this parameter is set:

set hive.compute.query.using.stats=true;

Try to set it false and execute again.

Alternatively you can compute statistics on the table.
See ANALYZE TABLE SYNTAX

Also it's possible to gather statistics during INSERT OVERWRITE automatically:

set hive.stats.autogather=true;

Why does count( distinct ) with NULL columns return 0 in Hive SQL?

It's the interface of count in hive:

count(*) counts all rows

count(col1) counts all rows where col1 is not null

count(distinct col1,col2...) counts all distinct rows where the specified columns are not null

As a solution to your specific problem, you can try to have a nested query with the logic and use count(*) in the outer query:

select count(*) from (select distinct 'A', NULL) a;
returns 1

Count rows with non-NULL in two columns

AND - is a boolean operator. It seems like cookie is a string, not boolean. try to replace count(l.cookie and c.cookie) with this: count(case when l.cookie is not null and c.cookie is not null then 1 else NULL end) as common



Related Topics



Leave a reply



Submit