How to Check If a Column Exists in Pandas

How to check if a column exists in Pandas

This will work:

if 'A' in df:

But for clarity, I'd probably write it as:

if 'A' in df.columns:

Pandas - Check if value from a column exists in any index of a MultiIndex dataframe

Use MultiIndex.to_frame with DataFrame.eq for compare all levels and DataFrame.any for test if at least one level is match:

df1 = df[df.index.to_frame().eq(df['Column'], axis=0).any(axis=1)]
print (df1)
Column
index1 index2
a b b
f e e
h f

Or use list comprehension with in for test if exist value of column in index:

df1 = df[[v in k for k, v in df['Column'].items()]]
print (df1)
Column
index1 index2
a b b
f e e
h f

To check if few values in dataframe column exists in another dataframe column

You can compare columns:

print(df1['one'].isin(df2['one']))  
0 True
1 True
2 True
3 True
Name: one, dtype: bool

Or convert values of DataFrame to 1d array and then list:

print(df1.isin(df2.to_numpy().ravel().tolist()))  
one
0 True
1 True
2 True
3 True

Pandas: Check if column exists in df from a list of columns

Here is how I would approach:

import numpy as np

for col in column_list:
if col not in df.columns:
df[col] = np.nan

How do I check if pandas df column value exists based on value in another column?

Compare Year for 2018 and then test if all values are only 2018:

mask = df['Year'].eq(2018).groupby(df['ID']).transform('all')

Another idea is test if Year is not 2018, filter ID for not matched at least one non 2018 row and last invert mask by ~ for get only 2018 groups:

mask = ~df['ID'].isin(df.loc[df['Year'].ne(2018), 'ID'])

Last convert mask to integers:

df['ID_only_in_2018'] = mask.astype(int)

Or:

df['ID_only_in_2018'] = np.where(mask, 1, 0)

Or:

df['ID_only_in_2018'] = mask.view('i1')


print (df)
Year ID Value ID_only_in_2018
0 2016 1 100 0
1 2017 1 102 0
2 2017 1 105 0
3 2018 1 98 0
4 2016 2 121 0
5 2016 2 101 0
6 2016 2 133 0
7 2018 3 102 1

Check if values in a column exist elsewhere in a dataframe row

Try:

out = (df[['a','b','c']].T==df['d']).any()

Output:

0    False
1 True
2 False
3 True
dtype: bool

Pyspark - Check if a column exists for a specific record

As per the spark documentation, when we are reading the json file and not providing the schema it will first look into the data and identify the schema. So once you read the json, rows with no specific fields in json file data, must have a null value for those fields.

I hope following snippet can be helpful to identify the fields with null value in original json

>>> from pyspark.sql import functions as F, types as T
>>> import json
>>> schema = spark.read.json('test.json').schema
>>> df = spark.read.text('test.json')
>>> null_fields_udf = F.UserDefinedFunction(lambda value: [key for key,value in json.loads(value).items() if value is None], T.ArrayType(T.StringType()))
>>> df = df.withColumn('fields_with_null', null_fields_udf(df.value))

>>> df = df.withColumn("value", F.from_json(df.value, schema))
>>> df = df.select("fields_with_null", "value.*")
>>> df.show()

+----------------+-------+-------+---+
|fields_with_null|field_a|field_b| id|
+----------------+-------+-------+---+
| []| test| null| 1|
| []| test| z| 2|
| [field_b]| test| null| 3|
+----------------+-------+-------+---+

>>>
>>>
>>> column_names = [col for col in df.columns if col != "fields_with_null"]
>>> column_names
['field_a', 'field_b', 'id']
>>>
>>>
>>> for col in column_names:
... df = df.withColumn("%s_was_null" % col, F.array_contains(df.fields_with_null, col))
...
>>> df.show()
+----------------+-------+-------+---+----------------+----------------+-----------+
|fields_with_null|field_a|field_b| id|field_a_was_null|field_b_was_null|id_was_null|
+----------------+-------+-------+---+----------------+----------------+-----------+
| []| test| null| 1| false| false| false|
| []| test| z| 2| false| false| false|
| [field_b]| test| null| 3| false| true| false|
+----------------+-------+-------+---+----------------+----------------+-----------+

>>> df = df.drop(df.fields_with_null)
>>> df.show()
+-------+-------+---+----------------+----------------+-----------+
|field_a|field_b| id|field_a_was_null|field_b_was_null|id_was_null|
+-------+-------+---+----------------+----------------+-----------+
| test| null| 1| false| false| false|
| test| z| 2| false| false| false|
| test| null| 3| false| true| false|
+-------+-------+---+----------------+----------------+-----------+

How to check if a value exist in other pandas columns which contains several values separated by comma

df['Column_C'] = df.apply(lambda x: x.Column_A in x.Column_B.split(','), axis=1)


Related Topics



Leave a reply



Submit