Find Names of Columns Which Contain Missing Values

Find names of columns which contain missing values

Like this?

colnames(mymatrix)[colSums(is.na(mymatrix)) > 0]
# [1] "aa" "ee"

Or as suggested by @thelatemail:

names(which(colSums(is.na(mymatrix)) > 0))
# [1] "aa" "ee"

Pandas: print column name with missing values

df.isnull().any() generates a boolean array (True if the column has a missing value, False otherwise). You can use it to index into df.columns:

df.columns[df.isnull().any()]

will return a list of the columns which have missing values.


df = pd.DataFrame({'A': [1, 2, 3], 
'B': [1, 2, np.nan],
'C': [4, 5, 6],
'D': [np.nan, np.nan, np.nan]})

df
Out:
A B C D
0 1 1.0 4 NaN
1 2 2.0 5 NaN
2 3 NaN 6 NaN

df.columns[df.isnull().any()]
Out: Index(['B', 'D'], dtype='object')

df.columns[df.isnull().any()].tolist() # to get a list instead of an Index object
Out: ['B', 'D']

How to find which columns contain any NaN value in Pandas dataframe

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0

How to find columns that contain N/A values

If those are the only cases, something like this could work:

na_cols = sapply(df, function(x) sum(ifelse(x == '' | is.na(x) == TRUE | x == 'N/A', 1, 0)))
names(na_cols[na_cols > 0])

If there were more "NA" conditions, you'd need to add to the ifelse statement.

make a list of the variables that contain missing values - pandas

Here error means there are some duplicated columns names, so df[var] return DataFrame, not Series.

df = pd.DataFrame ({'a':[np.nan, 1],'b':[1, 1],'c':[np.nan, np.nan]})

df.columns = ['a','a','s']
print (df['a'])
a a
0 NaN 1
1 1.0 1


vars_with_na = [var for var in df.columns if df[var].isnull().sum() > 0]
print (vars_with_na)

Possible solution is deduplicated them first.

Checking all columns in data frame for missing values in R

The anyNA function is built for this. You can apply it to all columns of a data frame with sapply(books, anyNA). To count NA values, akrun's suggestion of colSums(is.na(books)) is good.

Return list of column names with missing (NA) data for each row of a data frame in R

Here is a tidyverse solution.

df <- read_table("
ID Var1 Var2 Var3 Var4 Var5
1 10 T NA 2 NA
2 15 F 50 2 NA
3 12 NA 41 2 NA
4 NA NA NA 1 NA
5 NA F NA NA NA", col_names = TRUE)

library(dplyr)
library(tidyr)
df %>%
mutate(across(starts_with("var"), is.na)) %>% # replace all NA with TRUE and else FALSE
pivot_longer(-ID, names_to = "var") %>% # pivot longer
filter(value) %>% # remove the FALSE rows
group_by(ID) %>% # group by the ID
summarise(`Missing Variables` = toString(var)) # convert the variable names to a string column

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 5 x 2
ID `Missing Variables`
<dbl> <chr>
1 1 Var3, Var5
2 2 Var5
3 3 Var2, Var5
4 4 Var1, Var2, Var3, Var5
5 5 Var1, Var3, Var4, Var5

Find columns with all missing values

This is easy enough to with sapply and a small anonymous function:

sapply(test1, function(x)all(is.na(x)))
X1 X2 X3
FALSE FALSE FALSE

sapply(test2, function(x)all(is.na(x)))
X1 X2 X3
FALSE TRUE FALSE

And inside a function:

na.test <-  function (x) {
w <- sapply(x, function(x)all(is.na(x)))
if (any(w)) {
stop(paste("All NA in columns", paste(which(w), collapse=", ")))
}
}

na.test(test1)

na.test(test2)
Error in na.test(test2) : All NA in columns 2

Find column names of missing values based on list from other dataset

I think I found a way after researching a little bit more. I simply create a list of all the required points from the nominal table, join it with the measurement table on side, and then do a difference between these two columns of arrays. Seems to work fine.
Here is an example given the nominals table and the actuals table.
First create a list of all the nominal points per id and side

nominal_list = (
nominals
.groupby("side")
.agg(F.collect_list(F.col("point_name")))
.withColumnRenamed("collect_list(point_name)", "nominal_points")
)

then join over a similar dataset created with measurements with the additional id column (note the order of the arrays when calling F.array_except).

missing = (
actuals
.groupby(["id", "side"])
.agg(F.collect_list(F.col("point_name")))
.withColumnRenamed("collect_list(point_name)", "measured_points")
.join(nominal_list, "side")
.withColumn('missing', F.array_except('nominal_points', 'measured_points'))
)

Note that this is slightly different from what I asked in the beginning as I have the side as additional column and not hidden in the column name, i.e. missing_L and missing_R.

Viewing all column names with any NA in R

Another acrobatic solution (just for fun) :

colnames(df)[!complete.cases(t(df))]
[1] "b" "c"

The idea is : Getting the columns of A that have at least 1 NA is equivalent to get the rows that have at least NA for t(A).
complete.cases by definition (very efficient since it is just a call to C function) gives the rows without any missing value.



Related Topics



Leave a reply



Submit