How to Find Which Columns Don't Have Any Data (All Values Are Null)

How to find which columns don't have any data (all values are NULL)?

For a single column, count(ColumnName) returns the number of rows where ColumName is not null:

select  count(TheColumn)
from YourTable

You can generate a query for all columns. Per Martin's suggestion, you can exclude columns that cannot be null with is_nullable = 1. For example:

select  'count(' + name + ') as ' + name + ', '
from sys.columns
where object_id = object_id('YourTable')
and is_nullable = 1

If the number of tables is large, you can generate a query for all tables in a similiar way. The list of all tables is in sys.tables.

Select columns with NULL values only

Here is the sql 2005 or later version: Replace ADDR_Address with your tablename.

declare @col varchar(255), @cmd varchar(max)

DECLARE getinfo cursor for
SELECT c.name FROM sys.tables t JOIN sys.columns c ON t.Object_ID = c.Object_ID
WHERE t.Name = 'ADDR_Address'

OPEN getinfo

FETCH NEXT FROM getinfo into @col

WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @cmd = 'IF NOT EXISTS (SELECT top 1 * FROM ADDR_Address WHERE [' + @col + '] IS NOT NULL) BEGIN print ''' + @col + ''' end'
EXEC(@cmd)

FETCH NEXT FROM getinfo into @col
END

CLOSE getinfo
DEALLOCATE getinfo

How to find which columns contain any NaN value in Pandas dataframe

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

SQL query to find columns having at least one non null value

I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().

You can construct the query for counts using a query like this:

select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;

If you have a list of columns you can do this as dynamic SQL. Something like this:

declare @sql nvarchar(max);

select @sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(@list) s;

exec sp_executesql(@sql);

This might not quite work if your columns have special characters in them, but it illustrates the idea.

Find all those columns which have only null values, in a MySQL table

You can avoid using a procedure by dynamically creating (from the INFORMATION_SCHEMA.COLUMNS table) a string that contains the SQL you wish to execute, then preparing a statement from that string and executing it.

The SQL we wish to build will look like:

SELECT * FROM (
SELECT 'tableA' AS `table`,
IF(COUNT(`column_a`), NULL, 'column_a') AS `column`
FROM tableA
UNION ALL
SELECT 'tableB' AS `table`,
IF(COUNT(`column_b`), NULL, 'column_b') AS `column`
FROM tableB
UNION ALL
-- etc.
) t WHERE `column` IS NOT NULL

This can be done using the following:

SET group_concat_max_len = 4294967295; -- to overcome default 1KB limitation

SELECT CONCAT(
'SELECT * FROM ('
, GROUP_CONCAT(
'SELECT ', QUOTE(TABLE_NAME), ' AS `table`,'
, 'IF('
, 'COUNT(`', REPLACE(COLUMN_NAME, '`', '``'), '`),'
, 'NULL,'
, QUOTE(COLUMN_NAME)
, ') AS `column` '
, 'FROM `', REPLACE(TABLE_NAME, '`', '``'), '`'
SEPARATOR ' UNION ALL '
)
, ') t WHERE `column` IS NOT NULL'
)
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE();

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

See it on sqlfiddle.

Pandas select all columns without NaN

You can create with non-NaN columns using

df = df[df.columns[~df.isnull().all()]]

Or

null_cols = df.columns[df.isnull().all()]
df.drop(null_cols, axis = 1, inplace = True)

If you wish to remove columns based on a certain percentage of NaNs, say columns with more than 90% data as null

cols_to_delete = df.columns[df.isnull().sum()/len(df) > .90]
df.drop(cols_to_delete, axis = 1, inplace = True)

how to select rows with no null values (in any column) in SQL?

You need to explicitly list each column. I would recommend:

select t.*
from t
where col1 is not null and col2 is not null and . . .

Some people might prefer a more concise (but slower) method such as:

where concat(col1, col2, col3, . . . ) is not null

This is not actually a simple way to express this, although you can construct the query using metadata table or a spreadsheet.



Related Topics



Leave a reply



Submit