Find All Rows With Null Value(s) in Any Column
In SQL Server you can borrow the idea from this answer
;WITH XMLNAMESPACES('http://www.w3.org/2001/XMLSchema-instance' as ns)
SELECT *
FROM Analytics
WHERE (SELECT Analytics.*
FOR xml path('row'), elements xsinil, type
).value('count(//*[local-name() != "colToIgnore"]/@ns:nil)', 'int') > 0
SQL Fiddle
Likely constructing a query with 67 columns will be more efficient but it saves some typing or need for dynamic SQL to generate it.
How to get all rows with null value in any column in pyspark
Filter by chaining multiple OR conditions c_00 is null or c_01 is null OR ...
You can use python functools.reduce
to construct the filter expression dynamically from the dataframe columns:
from functools import reduce
from pyspark.sql import functions as F
df = spark.createDataFrame([
(None, 0.141, 0.141), (0.17, 0.17, 0.17),
(0.25, None, 0.25), (0.135, 0.135, 0.135)
], ["c_00", "c_01", "c_02"])
cols = [F.col(c) for c in df.columns]
filter_expr = reduce(lambda a, b: a | b.isNull(), cols[1:], cols[0].isNull())
df.filter(filter_expr).show()
#+----+-----+-----+
#|c_00| c_01| c_02|
#+----+-----+-----+
#|null|0.141|0.141|
#|0.25| null| 0.25|
#+----+-----+-----+
Or using array with exists
function:
filter_expr = F.exists(F.array(*df.columns), lambda x: x.isNull())
How to find all rows with a NULL value in any column using PostgreSQL
You can use NOT(<table> IS NOT NULL)
.
From the documentation :
If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while
IS NOT NULL is true when the row expression itself is non-null and all
the row's fields are non-null.
So :
SELECT * FROM t;
┌────────┬────────┐
│ f1 │ f2 │
├────────┼────────┤
│ (null) │ 1 │
│ 2 │ (null) │
│ (null) │ (null) │
│ 3 │ 4 │
└────────┴────────┘
(4 rows)
SELECT * FROM t WHERE NOT (t IS NOT NULL);
┌────────┬────────┐
│ f1 │ f2 │
├────────┼────────┤
│ (null) │ 1 │
│ 2 │ (null) │
│ (null) │ (null) │
└────────┴────────┘
(3 rows)
how to select rows with no null values (in any column) in SQL?
You need to explicitly list each column. I would recommend:
select t.*
from t
where col1 is not null and col2 is not null and . . .
Some people might prefer a more concise (but slower) method such as:
where concat(col1, col2, col3, . . . ) is not null
This is not actually a simple way to express this, although you can construct the query using metadata table or a spreadsheet.
Show a dataframe with all rows that have null values
If you don't care about which columns are null, you can use a loop to create a filtering condition:
from pyspark.sql import SparkSession
from pyspark.sql import functions as func
q1_df = spark\
.createDataFrame([(None, 1, 2), (3, None, 4), (5, 6, None), (7, 8, 9)],
['a', 'b', 'c'])
q1_df.show(5, False)
+----+----+----+
|a |b |c |
+----+----+----+
|null|1 |2 |
|3 |null|4 |
|5 |6 |null|
|7 |8 |9 |
+----+----+----+
condition = (func.lit(False))
for col in q1_df.columns:
condition = condition | (func.col(col).isNull())
q1_df.filter(condition).show(3, False)
+----+----+----+
|a |b |c |
+----+----+----+
|null|1 |2 |
|3 |null|4 |
|5 |6 |null|
+----+----+----+
As you're finding the row that any one column is null, you can use the OR condition.
Edit on: 2022-08-01
The reason why I first declare condition as func.lit(False)
is just for the simplification of my coding, just want to create a "base" condition. In fact, this filter doesn't have any usage in this filtering. When you check the condition
, you will see:
Column<'(((false OR (a IS NULL)) OR (b IS NULL)) OR (c IS NULL))'>
In fact you can use other method to create the condition. For example:
for idx, col in enumerate(q1_df.columns):
if idx == 0:
condition = (func.col(col).isNull())
else:
condition = condition | (func.col(col).isNull())
condition
Column<'(((a IS NULL) OR (b IS NULL)) OR (c IS NULL))'>
Alternatively, if you want to filter out the row that BOTH not null in all columns, in my coding, I would:
condition = (func.lit(True))
for col in q1_df.columns:
condition = condition & (func.col(col).isNotNull())
As long as you can create all the filtering condition, you can eliminate the func.lit(False)
. Just to remind that if you create the "base" condition like me, please don't use the python built-in bool type like below since they are not the same type (boolean
vs spark column
):
condition = False
for col in q1_df.columns:
condition = condition | (func.col(col).isNull())
How to filter in rows where any column is null in pyspark dataframe
For including rows having any columns with null
:
sparkDf.filter(F.greatest(*[F.col(i).isNull() for i in sparkDf.columns])).show(5)
For excluding the same:
sparkDf.na.drop(how='any').show(5)
How to select rows with NaN in particular column?
Try the following:
df[df['Col2'].isnull()]
Find index of all rows with null values in a particular column in pandas dataframe
Supposing you need the indices as a list, one option would be:
df[df['A'].isnull()].index.tolist()
MYSQL get all rows with only NULL in all columns
By refering to INFORMATION_SCHEMA and using PREPARE statement, one solution is here, with a full demo provided.
The solution refers to: Select all columns except one in MySQL?
SQL:
-- data
create table t1(description char(20), colA char(20), colB char(20));
insert into t1 values
( 'Peter' , 'bla', NULL),
( 'Frank' , NULL , NULL),
( 'George' , NULL , 'blub');
SELECT * FROM t1;
-- Query wanted
SET @sql = CONCAT(
'SELECT description FROM t1 WHERE COALESCE(',
(SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'description,', '')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 't1' AND TABLE_SCHEMA = 'test'),
') IS NULL');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
Output:
mysql> SELECT * FROM t1;
+-------------+------+------+
| description | colA | colB |
+-------------+------+------+
| Peter | bla | NULL |
| Frank | NULL | NULL |
| George | NULL | blub |
+-------------+------+------+
3 rows in set (0.00 sec)
mysql>
mysql> SET @sql = CONCAT(
-> 'SELECT description FROM t1 WHERE COALESCE(',
-> (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'description,', '')
-> FROM INFORMATION_SCHEMA.COLUMNS
-> WHERE TABLE_NAME = 't1' AND TABLE_SCHEMA = 'test'),
-> ') IS NULL');
Query OK, 0 rows affected (0.00 sec)
mysql> PREPARE stmt1 FROM @sql;
Query OK, 0 rows affected (0.00 sec)
Statement prepared
mysql> EXECUTE stmt1;
+-------------+
| description |
+-------------+
| Frank |
+-------------+
1 row in set (0.00 sec)
To elaborate the SET statement before PREPARE:
The SET is to generate a string as below.
SELECT description FROM t1 WHERE COALESCE( < list of all columns, except
description
> ) IS NULLThe is queried from INFORMATION_SCHEMA.COLUMNS, using the method in the reference link.
To use in your own environment, you need to
Change table name 't1' to your own table name;
Change TABLE_SCHEMA 'test' to your own database name.
Related Topics
SQL Server Case .. When .. in Statement
Query Combinations with Nested Array of Records in JSON Datatype
SQL Server, Converting Seconds to Minutes, Hours, Days
How to Run Multiple SQL Queries
SQL Server:Export Query as a .Txt File
SQL Query to Translate a List of Numbers Matched Against Several Ranges, to a List of Values
How to Select More Than 1 Record Per Day
List All Tables in Postgresql Information_Schema
Best Practice on Users/Roles on SQL Server for a Web Application
Recursive Stored Functions in MySQL
Postgres Trigger After Insert Accessing New
Spring Boot Query Annotation with Nativequery Doesn't Work in Postgresql
Insert Multiple Rows Using Subquery
Rounding a Datetime Value Down to the Nearest Half Hour
Foreign Key Column Mapped to Multiple Primary Keys
How to Get the Latest 2 Items Per Category in One Select (With MySQL)