Replacing all negative values in certain columns by another value in Pandas
You can just use indexing
by applying a condition statement.
cols = ['T1','T2','T3','T4']
df[df[cols] < 0] = -5
Output
In [35]: df
Out[35]:
T1 T2 T3 T4
0 20 -5 4 3
1 85 -5 34 21
2 -5 22 31 75
3 -5 5 7 -5
In your example you're just replacing the value of variable. You need to replace one cell's value using at
method.
for i in df.iloc[:,df.columns.get_loc("T1"):df.columns.get_loc("T1")+4]<0:
for index, j in enumerate(df[i]):
if j<0:
df.at[index, i] = -5
Replacing all negative values in all columns by zero in python
Use pandas.DataFrame.clip
:
df.iloc[:, 1:] = df.iloc[:, 1:].clip(0)
print(df)
Output:
date T1 T2 T3 T4
0 1-1-2010 00:10 20 0 4 3
1 1-1-2010 00:20 85 0 34 21
2 1-1-2010 00:30 0 22 31 75
3 1-1-2010 00:40 0 5 7 0
Not only clip
is faster than mask
in your sample, but also in the larger dataset:
# Your sample -> 3x faster
%timeit df.iloc[:, 1:].clip(0)
# 1.74 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.iloc[:,1:].mask(df.iloc[:,1:] < 0, 0)
# 5.25 ms ± 573 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Large Sample -> 1,000,000 elements --> about 30x
large_df = pd.DataFrame(pd.np.random.randint(-5, 5, (1000, 1000)))
%timeit large_df.clip(0)
# 17.2 ms ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit large_df.mask(large_df< 0, 0)
# 498 ms ± 47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Changing Negative Values to 0, Without Changing other Columns
What this, transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0] = 0
, is actually doing is applying the condition to the entire dataframe and when you put = 0
the 0 gets broadcasted to all the points of data. You're telling it to select all the rows in your dataframe where customer_price is less than 0 then change all the filtered rows to 0.
Aside from applying the condition you have to select the column/series that you want to change.
How I remember to use .loc
is df.loc[row filter/selection, column filter/selection]
Another way to do it would be
transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0,'customer_price'] = 0
There is a good section in the docs about setting values called Setting Values
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
Python Pandas: How to replace negative numbers by prior non-negative numbers?
Similar to your last question:
df['Value'] = df['Value'].where(df['Value'].ge(0)).ffill()
Related Topics
How to Delete All Columns in Dataframe Except Certain Ones
How to Find Duration Between Two Time Difference in Python Dataframe
How to Index a Middle Character in a List in Python
Filtering a Pyspark Dataframe Using Isin by Exclusion
Print the Student Name and the Score of Student in Python3
Add Numpy Array as Column to Pandas Data Frame
Valueerror: Time Data Does Not Match Format When Parsing a Date
How-To Run Tensorflow on Multiple Core and Threads
How to Get Interactive Plots Again in Spyder/Ipython/Matplotlib
Split/Explode a Column of Dictionaries into Separate Columns With Pandas
How to Add a Path to Pythonpath in Virtualenv
How to Print Only the Last Value in a for Loop
Splitting One CSV into Multiple Files
Adding a Data File in Pyinstaller Using the Onefile Option
Filenotfounderror: [Errno 2] No Such File or Directory
Json Dump in Python Writing Newline Character and Carriage Returns in File.