How to replace NaN values where the other columns meet a certain criteria?
I am trying to replace the unknown age of male, 1st class passengers
with the average age of male, 1st class passengers.
You can split the problem into 2 steps. First calculate the average age of male, 1st class passengers:
mask = (df['Pclass'] == 1) & (df['Sex'] == 'male')
avg_filler = df.loc[mask, 'Age'].mean()
Then update values satisfying your criteria:
df.loc[df['Age'].isnull() & mask, 'Age'] = avg_filler
How to replace nan values of a column based on certain values of other column
You can make a dictionary here:
rep_nan = {
'bachelor': 'tacher',
'blabla': 'blabla',
'high school': 'actor'
}
Then we can replace the nan values with:
df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)
For example:
>>> df
col1 col2
0 bachelor None
1 bachelor clown
2 blabla None
3 high school None
>>> df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)
>>> df
col1 col2
0 bachelor tacher
1 bachelor clown
2 blabla blabla
3 high school actor
Replace NaN Values with the Means of other Cols based on Condition
You could implement the function like this:
def replace_missing_with_conditional_mean(df, condition_cols, cols):
s = df.groupby(condition_cols)[cols].transform('mean')
return df.fillna(s.to_dict('series'))
res = replace_missing_with_conditional_mean(df, ['Col1', 'Col2'], ['Col3'])
print(res)
Output
Col1 Col2 Col3
0 A c 1.0
1 A c 3.0
2 B c 5.0
3 A d 6.0
4 A c 2.0
Python Pandas replace NaN in one column with value from corresponding row of second column
Assuming your DataFrame is in df
:
df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()
First replace any NaN
values with the corresponding value of df.Farheit
. Delete the 'Farheit'
column. Then rename the columns. Here's the resulting DataFrame
:
Filling nan if row in another column meets condition
Assumung the spaces are np.nan
, if not, you can replace then by df=df.replace('',np.nan)
you can use numpy.where()
for faster results:
df.country=np.where(df.state.isin(states),df.country.fillna('USA'),df.country)
print(df)
state country
0 AL USA
1 WI USA
2 FL USA
3 NJ USA
4 BM NaN
How can I replace NaN values based on multiple conditions?
You could create a replacement_value: index_mask
mapping using a dictionary and then iterate over it, like so:
>>> masks = {1: (df['B'] >= 10) & (df['B'] < 20) & df['C'].isnull(), 2: (df['B'] >= 20) & (df['B'] < 30) & df['C'].isnull(), 3: (df['B'] >= 30) & df['C'].isnull()}
>>> masks
{1: 0 False
1 False
2 True
3 False
dtype: bool, 2: 0 False
1 True
2 False
3 False
dtype: bool, 3: 0 False
1 False
2 False
3 False
dtype: bool}
>>> for replacement_value, mask in masks.items():
... df.loc[mask, 'C'] = replacement_value
...
>>> df
A B C
0 10 12 1.0
1 12 24 2.0
2 30 16 1.0
3 21 31 4.0
Note that I made the between conditions exclusive on the upper bound, i.e. to replace with 1 the value for df['B']
needs to be in the range [10, 20)]
; to replace with 2 [20, 30)
, etc., because otherwise you have overlapping bounds.
pandas replace np.nan based on multiple conditions
Try rewriting your np.where
statement:
df['is_less'] = np.where( (df['A'].isnull()) | (df['B'].isnull() ),np.nan, # check if A or B are np.nan
np.where(df['B'].ge(df['A']),'no','yes')) # check if B >= A
prints:
A B is_less
0 NaN 10.0 nan
1 10.0 NaN nan
2 1.0 5.0 no
3 5.0 1.0 yes
Greater than or equal
pandas.ge
Pandas: How to replace values to np.nan based on Condition for multiple columns
First we create a dictionary from your two lists using zip
replace_dict = dict(zip(list1,list2))
then we loop over it to handle your assignments,
for k,v in replace_dict.items():
df.loc[df[k] == 0, v] = np.nan
print(df)
I A B C D E F
0 1 9 4 0 T F NaN
1 2 0 5 1 NaN X J
2 3 1 8 0 G G NaN
another method is to use np.where
with your lists.
df[list2] = np.where(df[list1].eq(0), np.nan,df[list2])
print(df)
I A B C D E F
0 1 9 4 0 T F NaN
1 2 0 5 1 NaN X J
2 3 1 8 0 G G NaN
Python: how to replace NaN with conditions in a dataframe?
you can use groupby
to do this:
fill_value = df.groupby("node_i")["value_j"].mean().fillna(1.0)
df["w"] = fill_value.reindex(df["node_i"]).values
df["w"][df["value_j"].notnull()] = df["value_j"][df["value_j"].notnull()]
How to replace a dataframe column values with NaN based on a condition?
Using where
and between
df.Age=df.Age.where(df.Age.between(5,100))
df
ID Age
0 10 NaN
1 20 NaN
2 30 25.0
Related Topics
Check If Key Exists in a Python Dict in Jinja2 Templates
Centering Text in Ipython Notebook Markdown/Heading Cells
How to Save a Numpy Array as a 16-Bit Image Using "Normal" (Enthought) Python
Python - How to Extract Elements from an Array Based on an Array of Indices
Drop Non-Numeric Columns from a Pandas Dataframe
Index 0 Is Out of Bounds for Axis 0 With Size 0
How to Append Data Using Openpyxl Python to Excel File from a Specified Row
How to Count Values Greater Than the Group Mean in Pandas
Dice Rolling Simulator in Python
How to Save Plotly Offline Graph in Format Png
How to Split an Array According to Conditional Statement
Python Opencv Cv2 - How to Increase the Brightness and Contrast of an Image by 100%
Django - How to Retrieve Data in Database in Dropdownlist
How to Add Thousand Separator to Numbers in Python Pandas Dataframe
How to Find the Longest Word in a Text File