How to Replace Negative Numbers in Pandas Data Frame by Zero

How to replace negative numbers in Pandas Data Frame by zero

If all your columns are numeric, you can use boolean indexing:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [3]: df
Out[3]:
a b
0 0 -3
1 -1 2
2 2 1

In [4]: df[df < 0] = 0

In [5]: df
Out[5]:
a b
0 0 0
1 0 2
2 2 1

For the more general case, this answer shows the private method _get_numeric_data:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
'c': ['foo', 'goo', 'bar']})

In [3]: df
Out[3]:
a b c
0 0 -3 foo
1 -1 2 goo
2 2 1 bar

In [4]: num = df._get_numeric_data()

In [5]: num[num < 0] = 0

In [6]: df
Out[6]:
a b c
0 0 0 foo
1 0 2 goo
2 2 1 bar

With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df
Out[3]:
a b
0 0 days -3 days
1 -1 days 2 days
2 2 days 1 days

In [4]: for k, v in df.iteritems():
...: v[v < 0] = 0
...:

In [5]: df
Out[5]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Update: comparison with a pd.Timedelta works on the whole DataFrame:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df[df < pd.Timedelta(0)] = 0

In [4]: df
Out[4]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Replace dataframe column negative values with nan, in method chain

If assign counts as a method on df, you can recalculate the column b and assign it to df to replace the old column:

df = pd.DataFrame({'a': [1, 2] , 'b': [-3, 4], 'c': [5, -6]})

df.assign(b = df.b.where(df.b.ge(0)))
# a b c
#0 1 NaN 5
#1 2 4.0 -6

For better chaining behavior, you can use lambda function with assign:

df.assign(b = lambda x: x.b.where(x.b.ge(0)))

What is the fastest way to replace negative values with 0 and values greater than 1 with 1 in an array using Python?

You want to use np.clip:

>>> import numpy as np
>>> list_values = [-0.01, 0, 0.5, 0.9, 1.0, 1.01]
>>> arr = np.array(list_values)
>>> np.clip(arr, 0.0, 1.0)
array([0. , 0. , 0.5, 0.9, 1. , 1. ])

This is likely the fastest approach, if you can ignore the cost of converting to an array. Should be a lot better for larger lists/arrays.

Involving pandas in this operation isn't the way to go unless you eventually want a pandas data structure.

Replacing positive, negative, and zero values by 1, -1, and 0 respectively

There's a sign function in numpy:

df["trade_sign"] = np.sign(df["diff"])

If you want integers,

df["trade_sign"] = np.sign(df["diff"]).astype(int)

How to replace zeros in Pandas Data Frame by negative 1

I believe you have a data type of 8 bit unsigned integer. For that data type, there are no negatives and therefore a -1 overflows(underflows?) to the largest such number.

df = pd.DataFrame([[0, 1], [1, 0]], dtype=np.uint8)

df.replace(0, -1)

0 1
0 255 1
1 1 255

Where 255 is the largest such number.

np.iinfo(np.uint8).max

255

Instead, set the data type first

df.astype(int).replace(0, -1)

0 1
0 -1 1
1 1 -1


Related Topics



Leave a reply



Submit