How to Replace Nan Value with Zero in a Huge Data Frame

Function to replace all NaN values with zero:

Use boolean mask.

Suppose the following dataframe:

>>> df
A B C
0 0.0 1 2.0
1 NaN 4 5.0 # <- NaN should be replace by 0.1
2 6.0 7 NaN # <- NaN should be replace by 0
m1 = df.isna().any()  # Is there a NaN in columns (not mandatory)
m2 = df.eq(0).any() # Is there a 0 in columns

# Replace by 0
df.update(df.loc[:, m1 & ~m2].fillna(0))

# Replace by 0.1
df.update(df.loc[:, m1 & m2].fillna(0.1))

Only the second mask is useful

Output result:

>>> df
A B C
0 0.0 1 2.0
1 0.1 4 5.0
2 6.0 7 0.0

convert NaN values to 0.0 in data frame

If your data frame looks like this (with your second column called Cat2):

df <- data.frame(Category = c("HR", "Wing", "Soft"), Cat2 = c(NaN, NaN, NaN), N = c(0, 0, 0))

You can do this if you just want the second to read 0.0 if it contains an NA value:

df$Cat2[is.na(df$Cat2)] <- "0.0"

Or, if you want the percentage symbol after it too:

df$Cat2[is.na(df$Cat2)] <- "0.0%"

How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

I believe DataFrame.fillna() will do this for you.

Link to Docs for a dataframe and for a Series.

Example:

In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN

In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000

To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.

In [12]: df[1].fillna(0, inplace=True)
Out[12]:
0 0.000000
1 0.570994
2 0.000000
3 -0.229738
4 0.000000
Name: 1

In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000

EDIT:

To avoid a SettingWithCopyWarning, use the built in column-specific functionality:

df.fillna({1:0}, inplace=True)

I want to replace NaN values with 0 but not able to with the below code

In your code you passed to_replace="NaN".

Note that you actually passed here a string containing just these 3 letters.

In Pandas you can pass np.nan, but only as the value to be assigned
to a cell in a DataFrame. The same pertains to a Numpy array.

You can not pass to_replace=np.nan, because the comparison rules are
that one np.nan is NOT equal to another np.nan.

One of possible solutions is to run:

df2 = df2.where(~df2.isna(), 0)

Other, simpler solution, as richardec suggested, is to use fillna,
but the argument should be 0 (zero) not "o" (a char):

df2 = df2.fillna(0)

Replace NaN value in table from dataframe without changed other values

The problem here is the dtype (data type) of the column, or more exactly on the underlying numpy array. I assume that in your table dataframe, the column containing NaN values have a floating point type (float64).

If you replaced NaN with 0., all would be fine, but if you want to write an empty string there, Pandas change the dtype to object.

BTW, 3.0176e+06 is just a different representation of 3017601.99 but I would assume that the value has not changed. Simply pandas uses different representation for np.float64 type columns and object type columns.

You can ask it to use the default str conversion for float values in object columns by setting the relevant option: pd.set_option('display.float_format', str)

Demo:

>>> pd.set_option('display.float_format', None)                # reset option
>>> df = pd.DataFrame([[3017601.99], [np.nan]], columns=['A'])
>>> df
A
0 3017601.99
1 NaN
>>> df1 = df.fillna('')
>>> df1
A
0 3.0176e+06
1
>>> pd.set_option('display.float_format', str) # set the option
>>> df1
A
0 3017601.99
1
>>> df.loc[0,'A'] == df1.loc[0,'A']
True


Related Topics



Leave a reply



Submit