How to check if a column exists in Pandas
This will work:
if 'A' in df:
But for clarity, I'd probably write it as:
if 'A' in df.columns:
Pandas - Check if value from a column exists in any index of a MultiIndex dataframe
Use MultiIndex.to_frame
with DataFrame.eq
for compare all levels and DataFrame.any
for test if at least one level is match:
df1 = df[df.index.to_frame().eq(df['Column'], axis=0).any(axis=1)]
print (df1)
Column
index1 index2
a b b
f e e
h f
Or use list comprehension with in
for test if exist value of column in index:
df1 = df[[v in k for k, v in df['Column'].items()]]
print (df1)
Column
index1 index2
a b b
f e e
h f
To check if few values in dataframe column exists in another dataframe column
You can compare columns:
print(df1['one'].isin(df2['one']))
0 True
1 True
2 True
3 True
Name: one, dtype: bool
Or convert values of DataFrame to 1d array and then list:
print(df1.isin(df2.to_numpy().ravel().tolist()))
one
0 True
1 True
2 True
3 True
Pandas: Check if column exists in df from a list of columns
Here is how I would approach:
import numpy as np
for col in column_list:
if col not in df.columns:
df[col] = np.nan
How do I check if pandas df column value exists based on value in another column?
Compare Year for 2018
and then test if all values are only 2018
:
mask = df['Year'].eq(2018).groupby(df['ID']).transform('all')
Another idea is test if Year is not 2018
, filter ID
for not matched at least one non 2018
row and last invert mask by ~
for get only 2018
groups:
mask = ~df['ID'].isin(df.loc[df['Year'].ne(2018), 'ID'])
Last convert mask to integers:
df['ID_only_in_2018'] = mask.astype(int)
Or:
df['ID_only_in_2018'] = np.where(mask, 1, 0)
Or:
df['ID_only_in_2018'] = mask.view('i1')
print (df)
Year ID Value ID_only_in_2018
0 2016 1 100 0
1 2017 1 102 0
2 2017 1 105 0
3 2018 1 98 0
4 2016 2 121 0
5 2016 2 101 0
6 2016 2 133 0
7 2018 3 102 1
Check if values in a column exist elsewhere in a dataframe row
Try:
out = (df[['a','b','c']].T==df['d']).any()
Output:
0 False
1 True
2 False
3 True
dtype: bool
Pyspark - Check if a column exists for a specific record
As per the spark documentation, when we are reading the json file and not providing the schema it will first look into the data and identify the schema. So once you read the json, rows with no specific fields in json file data, must have a null value for those fields.
I hope following snippet can be helpful to identify the fields with null value in original json
>>> from pyspark.sql import functions as F, types as T
>>> import json
>>> schema = spark.read.json('test.json').schema
>>> df = spark.read.text('test.json')
>>> null_fields_udf = F.UserDefinedFunction(lambda value: [key for key,value in json.loads(value).items() if value is None], T.ArrayType(T.StringType()))
>>> df = df.withColumn('fields_with_null', null_fields_udf(df.value))
>>> df = df.withColumn("value", F.from_json(df.value, schema))
>>> df = df.select("fields_with_null", "value.*")
>>> df.show()
+----------------+-------+-------+---+
|fields_with_null|field_a|field_b| id|
+----------------+-------+-------+---+
| []| test| null| 1|
| []| test| z| 2|
| [field_b]| test| null| 3|
+----------------+-------+-------+---+
>>>
>>>
>>> column_names = [col for col in df.columns if col != "fields_with_null"]
>>> column_names
['field_a', 'field_b', 'id']
>>>
>>>
>>> for col in column_names:
... df = df.withColumn("%s_was_null" % col, F.array_contains(df.fields_with_null, col))
...
>>> df.show()
+----------------+-------+-------+---+----------------+----------------+-----------+
|fields_with_null|field_a|field_b| id|field_a_was_null|field_b_was_null|id_was_null|
+----------------+-------+-------+---+----------------+----------------+-----------+
| []| test| null| 1| false| false| false|
| []| test| z| 2| false| false| false|
| [field_b]| test| null| 3| false| true| false|
+----------------+-------+-------+---+----------------+----------------+-----------+
>>> df = df.drop(df.fields_with_null)
>>> df.show()
+-------+-------+---+----------------+----------------+-----------+
|field_a|field_b| id|field_a_was_null|field_b_was_null|id_was_null|
+-------+-------+---+----------------+----------------+-----------+
| test| null| 1| false| false| false|
| test| z| 2| false| false| false|
| test| null| 3| false| true| false|
+-------+-------+---+----------------+----------------+-----------+
How to check if a value exist in other pandas columns which contains several values separated by comma
df['Column_C'] = df.apply(lambda x: x.Column_A in x.Column_B.split(','), axis=1)
Related Topics
Compare Two CSV Files and Search for Similar Items
Using Print() (The Function Version) in Python2.X
Matplotlib Yaxis Range Display Using Absolute Values Rather Than Offset Values
How to Do Virtual File Processing
Python:2D Contour Plot from 3 Lists:X, Y and Rho
How to Make Built-In Containers (Sets, Dicts, Lists) Thread Safe
How to Remove the First Item from a List
Generate Permutations of List with Repeated Elements
Reading/Writing Ms Word Files in Python
Error Running Basic Tensorflow Example
Why Does Pyplot.Contour() Require Z to Be a 2D Array
How Does Python Importing Exactly Work
Memory-Efficient Built-In SQLalchemy Iterator/Generator
What's the Cleanest Way to Extract Urls from a String Using Python
How to Compare Two JSON Objects with the Same Elements in a Different Order Equal