Find All Rows with Null Value(S) in Any Column

Find All Rows With Null Value(s) in Any Column

In SQL Server you can borrow the idea from this answer

;WITH XMLNAMESPACES('http://www.w3.org/2001/XMLSchema-instance' as ns)
SELECT *
FROM Analytics
WHERE (SELECT Analytics.*
FOR xml path('row'), elements xsinil, type
).value('count(//*[local-name() != "colToIgnore"]/@ns:nil)', 'int') > 0

SQL Fiddle

Likely constructing a query with 67 columns will be more efficient but it saves some typing or need for dynamic SQL to generate it.

How to get all rows with null value in any column in pyspark

Filter by chaining multiple OR conditions c_00 is null or c_01 is null OR ...

You can use python functools.reduce to construct the filter expression dynamically from the dataframe columns:

from functools import reduce
from pyspark.sql import functions as F

df = spark.createDataFrame([
(None, 0.141, 0.141), (0.17, 0.17, 0.17),
(0.25, None, 0.25), (0.135, 0.135, 0.135)
], ["c_00", "c_01", "c_02"])

cols = [F.col(c) for c in df.columns]
filter_expr = reduce(lambda a, b: a | b.isNull(), cols[1:], cols[0].isNull())

df.filter(filter_expr).show()
#+----+-----+-----+
#|c_00| c_01| c_02|
#+----+-----+-----+
#|null|0.141|0.141|
#|0.25| null| 0.25|
#+----+-----+-----+

Or using array with exists function:

filter_expr = F.exists(F.array(*df.columns), lambda x: x.isNull())

How to find all rows with a NULL value in any column using PostgreSQL

You can use NOT(<table> IS NOT NULL).

From the documentation :

If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while
IS NOT NULL is true when the row expression itself is non-null and all
the row's fields are non-null.

So :

SELECT * FROM t;
┌────────┬────────┐
│ f1 │ f2 │
├────────┼────────┤
│ (null) │ 1 │
│ 2 │ (null) │
│ (null) │ (null) │
│ 3 │ 4 │
└────────┴────────┘
(4 rows)

SELECT * FROM t WHERE NOT (t IS NOT NULL);
┌────────┬────────┐
│ f1 │ f2 │
├────────┼────────┤
│ (null) │ 1 │
│ 2 │ (null) │
│ (null) │ (null) │
└────────┴────────┘
(3 rows)

how to select rows with no null values (in any column) in SQL?

You need to explicitly list each column. I would recommend:

select t.*
from t
where col1 is not null and col2 is not null and . . .

Some people might prefer a more concise (but slower) method such as:

where concat(col1, col2, col3, . . . ) is not null

This is not actually a simple way to express this, although you can construct the query using metadata table or a spreadsheet.

Show a dataframe with all rows that have null values

If you don't care about which columns are null, you can use a loop to create a filtering condition:

from pyspark.sql import SparkSession
from pyspark.sql import functions as func

q1_df = spark\
.createDataFrame([(None, 1, 2), (3, None, 4), (5, 6, None), (7, 8, 9)],
['a', 'b', 'c'])
q1_df.show(5, False)
+----+----+----+
|a |b |c |
+----+----+----+
|null|1 |2 |
|3 |null|4 |
|5 |6 |null|
|7 |8 |9 |
+----+----+----+

condition = (func.lit(False))
for col in q1_df.columns:
condition = condition | (func.col(col).isNull())
q1_df.filter(condition).show(3, False)
+----+----+----+
|a |b |c |
+----+----+----+
|null|1 |2 |
|3 |null|4 |
|5 |6 |null|
+----+----+----+

As you're finding the row that any one column is null, you can use the OR condition.


Edit on: 2022-08-01

The reason why I first declare condition as func.lit(False) is just for the simplification of my coding, just want to create a "base" condition. In fact, this filter doesn't have any usage in this filtering. When you check the condition, you will see:

Column<'(((false OR (a IS NULL)) OR (b IS NULL)) OR (c IS NULL))'>

In fact you can use other method to create the condition. For example:

for idx, col in enumerate(q1_df.columns):
if idx == 0:
condition = (func.col(col).isNull())
else:
condition = condition | (func.col(col).isNull())

condition
Column<'(((a IS NULL) OR (b IS NULL)) OR (c IS NULL))'>

Alternatively, if you want to filter out the row that BOTH not null in all columns, in my coding, I would:

condition = (func.lit(True)) 
for col in q1_df.columns:
condition = condition & (func.col(col).isNotNull())

As long as you can create all the filtering condition, you can eliminate the func.lit(False). Just to remind that if you create the "base" condition like me, please don't use the python built-in bool type like below since they are not the same type (boolean vs spark column):

condition = False

for col in q1_df.columns:
condition = condition | (func.col(col).isNull())

How to filter in rows where any column is null in pyspark dataframe

For including rows having any columns with null:

sparkDf.filter(F.greatest(*[F.col(i).isNull() for i in sparkDf.columns])).show(5)

For excluding the same:

sparkDf.na.drop(how='any').show(5)

How to select rows with NaN in particular column?

Try the following:

df[df['Col2'].isnull()]

Find index of all rows with null values in a particular column in pandas dataframe

Supposing you need the indices as a list, one option would be:

df[df['A'].isnull()].index.tolist()

MYSQL get all rows with only NULL in all columns

By refering to INFORMATION_SCHEMA and using PREPARE statement, one solution is here, with a full demo provided.

The solution refers to: Select all columns except one in MySQL?

SQL:

-- data
create table t1(description char(20), colA char(20), colB char(20));
insert into t1 values
( 'Peter' , 'bla', NULL),
( 'Frank' , NULL , NULL),
( 'George' , NULL , 'blub');
SELECT * FROM t1;

-- Query wanted
SET @sql = CONCAT(
'SELECT description FROM t1 WHERE COALESCE(',
(SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'description,', '')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 't1' AND TABLE_SCHEMA = 'test'),
') IS NULL');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

Output:

mysql> SELECT * FROM t1;
+-------------+------+------+
| description | colA | colB |
+-------------+------+------+
| Peter | bla | NULL |
| Frank | NULL | NULL |
| George | NULL | blub |
+-------------+------+------+
3 rows in set (0.00 sec)

mysql>
mysql> SET @sql = CONCAT(
-> 'SELECT description FROM t1 WHERE COALESCE(',
-> (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'description,', '')
-> FROM INFORMATION_SCHEMA.COLUMNS
-> WHERE TABLE_NAME = 't1' AND TABLE_SCHEMA = 'test'),
-> ') IS NULL');
Query OK, 0 rows affected (0.00 sec)

mysql> PREPARE stmt1 FROM @sql;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

mysql> EXECUTE stmt1;
+-------------+
| description |
+-------------+
| Frank |
+-------------+
1 row in set (0.00 sec)

To elaborate the SET statement before PREPARE:

  1. The SET is to generate a string as below.

    SELECT description FROM t1 WHERE COALESCE( < list of all columns, except description > ) IS NULL

  2. The is queried from INFORMATION_SCHEMA.COLUMNS, using the method in the reference link.

To use in your own environment, you need to

  1. Change table name 't1' to your own table name;

  2. Change TABLE_SCHEMA 'test' to your own database name.



Related Topics



Leave a reply



Submit