Count Null Values from Multiple Columns with SQL

Count NULL Values from multiple columns with SQL

SELECT COUNT(*)-COUNT(A) As A, COUNT(*)-COUNT(B) As B, COUNT(*)-COUNT(C) As C
FROM YourTable;

Count non-null values from multiple columns at once without manual entry in SQL

Consider below approach (no knowledge of column names is required at all - with exception of user)

select column, countif(value != 'null') nulls_count
from your_table t,
unnest(array(
select as struct trim(arr[offset(0)], '"') column, trim(arr[offset(1)], '"') value
from unnest(split(trim(to_json_string(t), '{}'))) kv,
unnest([struct(split(kv, ':') as arr)])
where trim(arr[offset(0)], '"') != 'user'
)) rec
group by column

if applied to sample data in your question - output is

Sample Image

Snowflake count nulls in all columns

But is there really not a way to count nulls in a table with say, over 30 columns? Like I don't want to specify them all by name?

yes exactly that. I don't understand why it's so difficult - it's like 1 line in pandas?

Keypoint here is if something is not provided as "batteries included" then you need to write your own version. It is not so hard as it may look.

Let's say the input table is as follow:

CREATE OR REPLACE TABLE t AS SELECT $1 AS col1, $2 AS col2, $3 AS col3, $4 AS col4 
FROM VALUES (1,2,3,10),(NULL,2,3,10),(NULL,NULL,4,10),(NULL,NULL,NULL,10);

SELECT * FROM t;
/*
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | 2 | 3 | 10 |
| NULL | 2 | 3 | 10 |
| NULL | NULL | 4 | 10 |
| NULL | NULL | NULL | 10 |
+------+------+------+------+
*/

You probably know how to write the query that gives the desired output, but as it was not provided in the question I will use my own version:

WITH cte AS (
SELECT
COUNT(*) AS total_rows
,total_rows - COUNT(col1) AS col1
,total_rows - COUNT(col2) AS col2
,total_rows - COUNT(col3) AS col3
,total_rows - COUNT(col4) AS col4
FROM t
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (col1,col2,col3, col4))
ORDER BY COLUMN_NAME;
/*
+-------------+--------------------+-------------------+
| COLUMN_NAME | NULLS_COLUMN_COUNT | NULLS_TOTAL_COUNT |
+-------------+--------------------+-------------------+
| COL1 | 3 | 6 |
| COL2 | 2 | 6 |
| COL3 | 1 | 6 |
| COL4 | 0 | 6 |
+-------------+--------------------+-------------------+
*/

Here we could see that the query is "static" in nature with few moving parts(column_count_list/table_name/column_list):

WITH cte AS (
SELECT
COUNT(*) AS total_rows
<column_count_list>
FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;

Now using the metadata and variables:

-- input
SET sch_name = 'my_schema';
SET tab_name = 't';

SELECT
LISTAGG(c.COLUMN_NAME, ', ') WITHIN GROUP(ORDER BY c.COLUMN_NAME) AS column_list
,ANY_VALUE(c.TABLE_SCHEMA || '.' || c.TABLE_NAME) AS full_table_name
,LISTAGG(REPLACE(SPACE(6) || ',total_rows - COUNT(<col_name>) AS <col_name>'
|| CHAR(13)
, '<col_name>', c.COLUMN_NAME), '')
WITHIN GROUP(ORDER BY COLUMN_NAME) AS column_count_list

,REPLACE(REPLACE(REPLACE(
'WITH cte AS (
SELECT
COUNT(*) AS total_rows
<column_count_list>
FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;'
,'<column_count_list>', column_count_list)
,'<table_name>', full_table_name)
,'<column_list>', column_list) AS query_to_run

FROM INFORMATION_SCHEMA.COLUMNS c
WHERE TABLE_SCHEMA = UPPER($sch_name)
AND TABLE_NAME = UPPER($tab_name);

Running the code will generate the query to be run:

Sample Image

Copying the output and running it will give the output. This template could be further refined and wrapped with stored procedure if needed(but I will left it as an exercise).

Count number of NULL values in each column in SQL

As Paolo said, but here is an example:

DECLARE @TableName VARCHAR(512) = 'invoiceTbl';
DECLARE @SQL VARCHAR(1024);
WITH SQLText AS (
SELECT
ROW_NUMBER() OVER (ORDER BY c.Name) AS RowNum,
'SELECT ''' + c.name + ''', SUM(CASE WHEN ' + c.Name + ' IS NULL THEN 1 ELSE 0 END) AS NullValues FROM ' + @TableName AS SQLRow
FROM
sys.tables t
INNER JOIN sys.columns c ON c.object_id = t.object_id
WHERE
t.name = @TableName),
Recur AS (
SELECT
RowNum,
CONVERT(VARCHAR(MAX), SQLRow) AS SQLRow
FROM
SQLText
WHERE
RowNum = 1
UNION ALL
SELECT
t.RowNum,
CONVERT(VARCHAR(MAX), r.SQLRow + ' UNION ALL ' + t.SQLRow)
FROM
SQLText t
INNER JOIN Recur r ON t.RowNum = r.RowNum + 1
)
SELECT @SQL = SQLRow FROM Recur WHERE RowNum = (SELECT MAX(RowNum) FROM Recur);
EXEC(@SQL);

How to get count of columns that are having null values for a given row in sql?

One method is to use case and +:

select t.*,
( (case when col1 is not null then 1 else 0 end) +
(case when col2 is not null then 1 else 0 end) +
(case when col3 is not null then 1 else 0 end) +
(case when col4 is not null then 1 else 0 end) +
(case when col5 is not null then 1 else 0 end) +
(case when col6 is not null then 1 else 0 end) +
(case when col7 is not null then 1 else 0 end)
) as cnt_not_nulls_in_row
from t;

In MySQL, this can be simplified to:

select t.*,
( (col1 is not null ) +
(col2 is not null ) +
(col3 is not null ) +
(col4 is not null ) +
(col5 is not null ) +
(col6 is not null ) +
(col7 is not null )
) as cnt_not_nulls_in_row
from t;

COUNT number of NULLs in multiple columns

Try this:

SELECT COUNT(*)-COUNT(column1) As column1, COUNT(*)-COUNT(column2) As column2 FROM my_table; 

Maybe this will work for you. Since like you mentioned, COUNT only counts non-null values, this should give you the sum of null values you need.

How to Count multiple columns of a table, that could have nulls

Use unambiguous dates and avoid trying to find the "end" of the day, because that is prone to breaking if the data types change anywhere. Much safer to use less than the next day (more details here). Anyway one way to solve the problem you're having is to use conditional aggregation:

DECLARE @start date = '20220601',
@end date = '20220818';

SELECT u.Username,
Approved = SUM(CASE WHEN i.ApprovedBy = u.Username THEN 1 ELSE 0 END),
Rejected = SUM(CASE WHEN i.RejectedBy = u.Username THEN 1 ELSE 0 END)
FROM dbo.Users AS u
LEFT OUTER JOIN dbo.tblItems AS i
ON
(
(
u.Username = i.ApprovedBy
AND i.ApprovedOn >= @start
AND i.ApprovedOn < @end
)
OR
(
u.Username = i.RejectedBy
AND i.RejectedOn >= @start
AND i.RejectedOn < @end
)
)
GROUP BY u.Username;

This also corrects a logic error in your current query that uses the rejected date to credit u3 with an approval that happened outside your desired range, and includes rows from Users that don't have any approvals or rejections.

  • Example db<>fiddle

If you don't want to include users that are in the Users table and don't have any matching rows in tblItems, just change LEFT OUTER JOIN to INNER JOIN.

Count null and non-null values in all columns in a table

You can do this -

SELECT COUNT(name) AS name_not_null_count, 
SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) AS name_null_count
FROM table

Approach to calculate null count is: mark all the null records with 1 and take SUM.



Related Topics



Leave a reply



Submit