How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
I believe DataFrame.fillna()
will do this for you.
Link to Docs for a dataframe and for a Series.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.
In [12]: df[1].fillna(0, inplace=True)
Out[12]:
0 0.000000
1 0.570994
2 0.000000
3 -0.229738
4 0.000000
Name: 1
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
EDIT:
To avoid a SettingWithCopyWarning
, use the built in column-specific functionality:
df.fillna({1:0}, inplace=True)
Function to replace all NaN values with zero:
Use boolean mask.
Suppose the following dataframe:
>>> df
A B C
0 0.0 1 2.0
1 NaN 4 5.0 # <- NaN should be replace by 0.1
2 6.0 7 NaN # <- NaN should be replace by 0
m1 = df.isna().any() # Is there a NaN in columns (not mandatory)
m2 = df.eq(0).any() # Is there a 0 in columns
# Replace by 0
df.update(df.loc[:, m1 & ~m2].fillna(0))
# Replace by 0.1
df.update(df.loc[:, m1 & m2].fillna(0.1))
Only the second mask is useful
Output result:
>>> df
A B C
0 0.0 1 2.0
1 0.1 4 5.0
2 6.0 7 0.0
I want to replace NaN values with 0 but not able to with the below code
In your code you passed to_replace="NaN"
.
Note that you actually passed here a string containing just these 3 letters.
In Pandas you can pass np.nan
, but only as the value to be assigned
to a cell in a DataFrame. The same pertains to a Numpy array.
You can not pass to_replace=np.nan
, because the comparison rules are
that one np.nan is NOT equal to another np.nan.
One of possible solutions is to run:
df2 = df2.where(~df2.isna(), 0)
Other, simpler solution, as richardec suggested, is to use fillna,
but the argument should be 0 (zero) not "o" (a char):
df2 = df2.fillna(0)
Replacing nan values in a Pandas data frame with lists
You have to handle the three cases (empty string, NaN, NaN in list) separately.
For the NaN in list you need to loop over each occurrence and replace the elements one by one.
NB. applymap
is slow, so if you know in advance the columns to use you can subset them
For the empty string, replace them to NaN, then fillna
.
sub = 'X'
(df.applymap(lambda x: [sub if (pd.isna(e) or e=='')
else e for e in x]
if isinstance(x, list) else x)
.replace('', float('nan'))
.fillna(sub)
)
Output:
col1 col2 col3 col4
0 X Jhon [X, 1, 2] [k, j]
1 1.0 X [1, 1, 5] 3
2 2.0 X X X
3 3.0 Samy [1, 1, X] [b, X]
Used input:
from numpy import nan
df = pd.DataFrame({'col1': {0: nan, 1: 1.0, 2: 2.0, 3: 3.0},
'col2': {0: 'Jhon', 1: nan, 2: '', 3: 'Samy'},
'col3': {0: [nan, 1, 2], 1: [1, 1, 5], 2: nan, 3: [1, 1, nan]},
'col4': {0: ['k', 'j'], 1: '3', 2: nan, 3: ['b', '']}})
Pandas replace NaN values with zeros after pivot operation
I think problem is NaN
are strings, so cannot replace them, so first try convert valus to numeric:
df['Rain (mm)'] = pd.to_numeric(df['Rain (mm)'], errors='coerce')
df = df.pivot_table(index=['Month', 'Day'], columns='Year',
values='Rain (mm)', aggfunc='first').fillna(0)
Replace null values in pandas data frame column with 2D np.zeros() array
It is cause by the object data type we have a way with fillna
df.val.fillna(dict(zip(df.index[df['val'].isnull()],[z]*df['val'].isnull().sum())),inplace=True)
df
val
0 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
1 2.0
2 3.0
3 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
4 5.0
How to conditionally replace NaN values in a dataframe?
fillna
can take a series to replace NaN values with. Non-NaN values are left untouched.
Replace the month numbers with the values from your dictionary
with map
, then pass the result to fillna
:
df["WL1"] = df.WL1.fillna(df.Month.map(dictionary["WL1"]))
Replace 0 with NaN for selected columns only if all values are 0 in Pandas
Use mask
:
df[cols] = df[cols].mask(df[cols].eq(0).all(axis=1))
mask
automatically sets the row to NaN
if the condition (df[cols].eq(0).all(axis=1)
) is True
.
Original answer:
I'd prefer mask
:
>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1))
value1 value2 value3
id
0 22.0 1.0 7.0
1 NaN NaN NaN
2 NaN NaN NaN
3 4.0 1.0 25.0
4 5.0 0.0 24.0
5 0.0 0.0 3.0
>>>
With resetting index:
>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1)).reset_index()
id value1 value2 value3
0 0 22.0 1.0 7.0
1 1 NaN NaN NaN
2 2 NaN NaN NaN
3 3 4.0 1.0 25.0
4 4 5.0 0.0 24.0
5 5 0.0 0.0 3.0
>>>
Related Topics
Most Efficient Way to Map Function Over Numpy Array
Make a Dictionary With Duplicate Keys in Python
How to Clear the Interpreter Console
How to Execute a String Containing Python Code in Python
How to Melt a Pandas Dataframe
Convert Hex String to Integer in Python
Combine Two Columns of Text in Pandas Dataframe
Using Module 'Subprocess' With Timeout
Concatenate Strings from Several Rows Using Pandas Groupby
How to Schedule Updates (F/E, to Update a Clock) in Tkinter
Tkinter: How to Use After Method
Saving an Object (Data Persistence)
What Are the Advantages of Numpy Over Regular Python Lists
How to Check If a List Is Empty
Does Python Have "Private" Variables in Classes