How to Test If a Column Exists and Is Not Null in a Dataframe

How to test if a column exists and is not null in a DataFrame

if (logsDF['column6'] in rddstats and logsDF['column6'].isNotNull)

I'm pretty sure you are going to be throwing a KeyError if column6 does not exist.

You could do something like:

if 'column6' in logsDF.columns:
if logsDF['column6'].notnull().any():
logsDF.select("column1","column2","column3","column4","column5","column6")
else:
logsz84statsDF.select("column1","column2","column3","column4","column5","column7")
else:
logsz84statsDF.select("column1","column2","column3","column4","column5","column7")

Check to see if column6 exists in logsDF columns first.
If so, see if any() value is not null.

Column7 is used if column6 does not exist, or if column6 exists but all values are null.


Editing my own comment:
Since python will not evaluate the second condition if the first is False, you can actually do:

if 'column6' in logsDF.columns and logsDF['column6'].notnull().any():
logsDF.select("column1","column2","column3","column4","column5","column6")
else:
logsz84statsDF.select("column1","column2","column3","column4","column5","column7")

as long as the 'column6' in logsDF.columns comes first, the logsDF['column6'] will never evaluate and throw the KeyError if column6 doesn't exist.

How to check if a column exists in Pandas

This will work:

if 'A' in df:

But for clarity, I'd probably write it as:

if 'A' in df.columns:

To find whether a column exists in data frame or not

Assuming that the name of your data frame is dat and that your column name to check is "d", you can use the %in% operator:

if("d" %in% colnames(dat))
{
cat("Yep, it's in there!\n");
}

Pandas: Check if column exists in df from a list of columns

Here is how I would approach:

import numpy as np

for col in column_list:
if col not in df.columns:
df[col] = np.nan

check if pandas data frame column (string/object) is numeric (ignore empty/NULL/NAN)

You could strip white spaces and convert empty string to NaN, then drop it; then do the test:

out = pd.to_numeric(df['col1'].str.strip().replace('', pd.NA).dropna(), errors='coerce').notna().all().item()

Output:

True

This test throws False for the following input:

df = pd.DataFrame({'col1':['1', 's']})


Related Topics



Leave a reply



Submit