How to Replace Nan Values Where the Other Columns Meet a Certain Criteria

How to replace NaN values where the other columns meet a certain criteria?

I am trying to replace the unknown age of male, 1st class passengers
with the average age of male, 1st class passengers.

You can split the problem into 2 steps. First calculate the average age of male, 1st class passengers:

mask = (df['Pclass'] == 1) & (df['Sex'] == 'male')
avg_filler = df.loc[mask, 'Age'].mean()

Then update values satisfying your criteria:

df.loc[df['Age'].isnull() & mask, 'Age'] = avg_filler

How to replace nan values of a column based on certain values of other column

You can make a dictionary here:

rep_nan = {
'bachelor': 'tacher',
'blabla': 'blabla',
'high school': 'actor'
}

Then we can replace the nan values with:

df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)

For example:

>>> df
col1 col2
0 bachelor None
1 bachelor clown
2 blabla None
3 high school None
>>> df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)
>>> df
col1 col2
0 bachelor tacher
1 bachelor clown
2 blabla blabla
3 high school actor

Replace NaN Values with the Means of other Cols based on Condition

You could implement the function like this:

def replace_missing_with_conditional_mean(df, condition_cols, cols):
s = df.groupby(condition_cols)[cols].transform('mean')
return df.fillna(s.to_dict('series'))


res = replace_missing_with_conditional_mean(df, ['Col1', 'Col2'], ['Col3'])
print(res)

Output

  Col1 Col2  Col3
0 A c 1.0
1 A c 3.0
2 B c 5.0
3 A d 6.0
4 A c 2.0

Python Pandas replace NaN in one column with value from corresponding row of second column

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:

resulting DataFrame

Filling nan if row in another column meets condition

Assumung the spaces are np.nan , if not, you can replace then by df=df.replace('',np.nan) you can use numpy.where() for faster results:

df.country=np.where(df.state.isin(states),df.country.fillna('USA'),df.country)
print(df)

state country
0 AL USA
1 WI USA
2 FL USA
3 NJ USA
4 BM NaN

How can I replace NaN values based on multiple conditions?

You could create a replacement_value: index_mask mapping using a dictionary and then iterate over it, like so:

>>> masks = {1: (df['B'] >= 10) & (df['B'] < 20) & df['C'].isnull(), 2: (df['B'] >= 20) & (df['B'] < 30) & df['C'].isnull(), 3: (df['B'] >= 30) & df['C'].isnull()}
>>> masks
{1: 0 False
1 False
2 True
3 False
dtype: bool, 2: 0 False
1 True
2 False
3 False
dtype: bool, 3: 0 False
1 False
2 False
3 False
dtype: bool}
>>> for replacement_value, mask in masks.items():
... df.loc[mask, 'C'] = replacement_value
...
>>> df
A B C
0 10 12 1.0
1 12 24 2.0
2 30 16 1.0
3 21 31 4.0

Note that I made the between conditions exclusive on the upper bound, i.e. to replace with 1 the value for df['B'] needs to be in the range [10, 20)]; to replace with 2 [20, 30), etc., because otherwise you have overlapping bounds.

pandas replace np.nan based on multiple conditions

Try rewriting your np.where statement:

df['is_less'] = np.where( (df['A'].isnull()) | (df['B'].isnull() ),np.nan, # check if A or B are np.nan
np.where(df['B'].ge(df['A']),'no','yes')) # check if B >= A

prints:

      A     B is_less
0 NaN 10.0 nan
1 10.0 NaN nan
2 1.0 5.0 no
3 5.0 1.0 yes

Greater than or equal

  • pandas.ge

Pandas: How to replace values to np.nan based on Condition for multiple columns

First we create a dictionary from your two lists using zip

replace_dict = dict(zip(list1,list2))

then we loop over it to handle your assignments,

for k,v in replace_dict.items():
df.loc[df[k] == 0, v] = np.nan

print(df)
I A B C D E F
0 1 9 4 0 T F NaN
1 2 0 5 1 NaN X J
2 3 1 8 0 G G NaN

another method is to use np.where with your lists.

df[list2] = np.where(df[list1].eq(0), np.nan,df[list2])

print(df)

I A B C D E F
0 1 9 4 0 T F NaN
1 2 0 5 1 NaN X J
2 3 1 8 0 G G NaN

Python: how to replace NaN with conditions in a dataframe?

you can use groupby to do this:

fill_value = df.groupby("node_i")["value_j"].mean().fillna(1.0)
df["w"] = fill_value.reindex(df["node_i"]).values
df["w"][df["value_j"].notnull()] = df["value_j"][df["value_j"].notnull()]

How to replace a dataframe column values with NaN based on a condition?

Using where and between

df.Age=df.Age.where(df.Age.between(5,100))
df
ID Age
0 10 NaN
1 20 NaN
2 30 25.0


Related Topics



Leave a reply



Submit