Function to replace all NaN values with zero:
Use boolean mask.
Suppose the following dataframe:
>>> df
A B C
0 0.0 1 2.0
1 NaN 4 5.0 # <- NaN should be replace by 0.1
2 6.0 7 NaN # <- NaN should be replace by 0
m1 = df.isna().any() # Is there a NaN in columns (not mandatory)
m2 = df.eq(0).any() # Is there a 0 in columns
# Replace by 0
df.update(df.loc[:, m1 & ~m2].fillna(0))
# Replace by 0.1
df.update(df.loc[:, m1 & m2].fillna(0.1))
Only the second mask is useful
Output result:
>>> df
A B C
0 0.0 1 2.0
1 0.1 4 5.0
2 6.0 7 0.0
convert NaN values to 0.0 in data frame
If your data frame looks like this (with your second column called Cat2):
df <- data.frame(Category = c("HR", "Wing", "Soft"), Cat2 = c(NaN, NaN, NaN), N = c(0, 0, 0))
You can do this if you just want the second to read 0.0 if it contains an NA value:
df$Cat2[is.na(df$Cat2)] <- "0.0"
Or, if you want the percentage symbol after it too:
df$Cat2[is.na(df$Cat2)] <- "0.0%"
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
I believe DataFrame.fillna()
will do this for you.
Link to Docs for a dataframe and for a Series.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.
In [12]: df[1].fillna(0, inplace=True)
Out[12]:
0 0.000000
1 0.570994
2 0.000000
3 -0.229738
4 0.000000
Name: 1
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
EDIT:
To avoid a SettingWithCopyWarning
, use the built in column-specific functionality:
df.fillna({1:0}, inplace=True)
I want to replace NaN values with 0 but not able to with the below code
In your code you passed to_replace="NaN"
.
Note that you actually passed here a string containing just these 3 letters.
In Pandas you can pass np.nan
, but only as the value to be assigned
to a cell in a DataFrame. The same pertains to a Numpy array.
You can not pass to_replace=np.nan
, because the comparison rules are
that one np.nan is NOT equal to another np.nan.
One of possible solutions is to run:
df2 = df2.where(~df2.isna(), 0)
Other, simpler solution, as richardec suggested, is to use fillna,
but the argument should be 0 (zero) not "o" (a char):
df2 = df2.fillna(0)
Replace NaN value in table from dataframe without changed other values
The problem here is the dtype
(data type) of the column, or more exactly on the underlying numpy array. I assume that in your table
dataframe, the column containing NaN values have a floating point type (float64).
If you replaced NaN with 0., all would be fine, but if you want to write an empty string there, Pandas change the dtype
to object
.
BTW, 3.0176e+06 is just a different representation of 3017601.99 but I would assume that the value has not changed. Simply pandas uses different representation for np.float64
type columns and object
type columns.
You can ask it to use the default str
conversion for float values in object
columns by setting the relevant option: pd.set_option('display.float_format', str)
Demo:
>>> pd.set_option('display.float_format', None) # reset option
>>> df = pd.DataFrame([[3017601.99], [np.nan]], columns=['A'])
>>> df
A
0 3017601.99
1 NaN
>>> df1 = df.fillna('')
>>> df1
A
0 3.0176e+06
1
>>> pd.set_option('display.float_format', str) # set the option
>>> df1
A
0 3017601.99
1
>>> df.loc[0,'A'] == df1.loc[0,'A']
True
Related Topics
"Correct" Way to Specifiy Optional Arguments in R Functions
Generate an Incrementally Increasing Sequence Like 112123123412345
Alternative to Expand.Grid for Data.Frames
Count Values Separated by a Comma in a Character String
Why Is 'Vapply' Safer Than 'Sapply'
Why Is Allow.Cartesian Required at Times When When Joining Data.Tables with Duplicate Keys
Select Values from Different Columns Based on a Variable Containing Column Names
Linear Regression "Na" Estimate Just for Last Coefficient
Why Do Some Unicode Characters Display in Matrices, But Not Data Frames in R
Load Multiple Packages at Once
Using Lists Inside Data.Table Columns
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
How to Connect Two Coordinates with a Line Using Leaflet in R