Dropping Infinite Values from Dataframes in Pandas

Dropping infinite values from dataframes in pandas?

First replace() infs with NaN:

df.replace([np.inf, -np.inf], np.nan, inplace=True)

and then drop NaNs via dropna():

df.dropna(subset=["col1", "col2"], how="all", inplace=True)

For example:

>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
col1 col2
0 1.0 2.0
1 inf 3.0
2 -inf NaN

>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
2 NaN NaN

>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0

The same method also works for Series.

Python pandas: how to remove nan and -inf values

Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe.

df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]

time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2
4 0.037389 3 10 3 0.333333 2.0 0.500000 1.0 1.000000
5 0.037393 4 10 4 0.250000 3.0 0.333333 2.0 0.500000
1030308 9.962213 256 268 256 0.000000 256.0 0.003906 255.0 0.003922

Remove nan, +inf, -inf values columns from a dataframe

First replace() inf and -inf with nan:

df = pd.DataFrame({'a':[1,2,3], 'b':[4,np.nan,6], 'c':[7,8,np.inf]})
df = df.replace([np.inf, -np.inf], np.nan)

# a b c
# 0 1 4.0 7.0
# 1 2 NaN 8.0
# 2 3 6.0 NaN

Then use the axis param of dropna() to switch between row- and column-based behavior:

df.dropna() # default axis=0 is row-based

# a b c
# 0 1 4.0 7.0
df.dropna(axis=1) # axis=1 or axis='columns' is column-based

# a
# 0 1
# 1 2
# 2 3

Replace all inf, -inf values with NaN in a pandas dataframe

TL;DR

  • df.replace is fastest for replacing ±inf
  • but you can avoid replacing altogether by just setting mode.use_inf_as_na


Replacing inf and -inf

df = df.replace([np.inf, -np.inf], np.nan)

Note that inplace is possible but not recommended and will soon be deprecated.

Slower df.applymap options:

  • df = df.applymap(lambda x: np.nan if x in [np.inf, -np.inf] else x)
  • df = df.applymap(lambda x: np.nan if np.isinf(x) else x)
  • df = df.applymap(lambda x: x if np.isfinite(x) else np.nan)


Setting mode.use_inf_as_na

Note that we don't actually have to modify df at all. Setting mode.use_inf_as_na will simply change the way inf and -inf are interpreted:

True means treat None, nan, -inf, inf as null

False means None and nan are null, but inf, -inf are not null (default)

  • Either enable globally

    pd.set_option('mode.use_inf_as_na', True)
  • Or locally via context manager

    with pd.option_context('mode.use_inf_as_na', True):
    ...

How do you detect and delete infinite values from a time series in a pandas dataframe?

You can try to filter out the infinite values with numpy.inf. The code is following:

import numpy as np
perc_df[perc_df.variable != np.inf].variable.mean()

Python Pandas: For Loop to drop rows from dataframes where values are the same in before/after cases

Make a list containing the dataframes and iterate:

df_list =[*list of dfs]

for df in df_list:
new_df = df[df['before'] != df['after']]

Then you can append it to a new list... or whatever you want to do with it
If all your dfs are in a dictionary, you iterate as well just index into it:

df_dict = {key0:df0,key1:df1 ....}
for key,df in df_dict.items():
new_df = df[df['before'] != df['after']]

or even less pythonic:

for key in df_dict.keys():
df = df_dict[key]
new_df = df[df['before'] != df['after']]

You can even convert you dictionary values to a list and use the first method:

df_list = list(df_dict.values())

Replacing -inf values to np.nan in a feature pandas.series

The problem may be that you are not assigning back to the original series.

Note that pd.Series.replace is not an in-place operation by default. The below code is a minimal example.

df = pd.DataFrame({'feature': [1, 2, -np.inf, 3, 4]})

df['feature'] = df['feature'].replace(-np.inf, np.nan)

print(df)

# feature
# 0 1.0
# 1 2.0
# 2 NaN
# 3 3.0
# 4 4.0

Bug: impossible to delete infinite values from DataFrame

Your question is similar to dropping infinite values from dataframes in pandas?,
did you try:

df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")

np.nan is not considered as finite, you may replace np.nan by any finite number see that code for example:


import pandas as pd
import numpy as np

df = pd.DataFrame(columns=list('ABC'))
df.loc[0] = [1,np.inf,-np.inf]
print df

print np.all(np.isfinite(df))

df_nan = df.replace([np.inf, -np.inf], np.nan).dropna(subset=df.columns, how="all")
print df_nan

print np.all(np.isfinite(df_nan))

df_0 = df.replace([np.inf, -np.inf], 0).dropna(subset=df.columns, how="all")
print df_0

print np.all(np.isfinite(df_0))

Result:

     A    B    C
0 1.0 inf -inf
False
A B C
0 1.0 NaN NaN
False
A B C
0 1.0 0.0 0.0
True


Related Topics



Leave a reply



Submit