Pandas: Starting from the second row. subtract from previous row and use it as value to the next subtraction
Numpy, cumsum
with alternating sign
i = np.arange(len(df))
j = np.arange(2)
a = np.where(
(i[:, None] + j) % 2 == 0, 1, -1
) * df.VALUE.values[:, None]
b = a.cumsum(0)[i, i % 2]
df.assign(VALUE=b)
ID VALUE
0 0 1
1 1 9
2 2 21
3 3 24
4 4 54
Explanation
First thing is to notice that
X0 -> X0
X1 -> X1 - X0
X2 -> X2 - X1 + X0
X3 -> X3 - X2 + X1 - X0
X4 -> X4 - X3 + X2 - X1 + X0
So I wanted to multiply every other row by negative one... but I needed to do this twice for the other choice of alternating rows.
I needed to generate a mask that swaps between + and - 1 for both options
i = np.arange(len(df))
j = np.arange(2)
m = np.where(
(i[:, None] + j) % 2 == 0, 1, -1
)
m
array([[ 1, -1],
[-1, 1],
[ 1, -1],
[-1, 1],
[ 1, -1]])
Now I need to broadcast multiply this across my df.VALUE
a = m * df.VALUE.values[:, None]
a
array([[ 1, -1],
[-10, 10],
[ 30, -30],
[-45, 45],
[ 78, -78]])
Notice the pattern. Now I cumsum
a.cumsum(0)
array([[ 1, -1],
[ -9, 9],
[ 21, -21],
[-24, 24],
[ 54, -54]])
But I need the positive ones... more specifically, I need the alternating ones. So I slice with a modded arange
b = a.cumsum(0)[i, i % 2]
b
array([ 1, 9, 21, 24, 54])
This is what I ended up assigning to the existing column
df.assign(VALUE=b)
ID VALUE
0 0 1
1 1 9
2 2 21
3 3 24
4 4 54
This produces a copy of df
and overwrites the VALUE
column with b
.
To persist this answer, make sure to reassign to a new name or df
if you want.
df_new = df.assign(VALUE=b)
pandas subtracting value in another column from previous row
Here is one potential way to do this.
First create a boolean mask, then use numpy.where
and Series.shift
to create the column date_difference:
mask = df.duplicated(['identifier', 'id_number'])
df['date_difference'] = (np.where(mask, (df['contract_year_month'] -
df['collection_year_month'].shift(1)).dt.days, np.nan))
[output]
identifier id_number contract_year_month collection_year_month date_difference
0 K001 1 2018-01-03 2018-01-09 NaN
1 K001 1 2018-01-08 2018-01-10 -1.0
2 K001 2 2018-01-01 2018-01-05 NaN
3 K001 2 2018-01-15 2018-01-18 10.0
4 K002 4 2018-01-04 2018-01-07 NaN
5 K002 4 2018-01-09 2018-01-15 2.0
Conditional shift: Subtract 'previous row value' from 'current row value' with multiple conditions in pandas
You may try something like this:
df['DiffHeartRate']=(df.groupby(['Disease', 'State',
(df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate']
.apply(lambda x: x.diff())).fillna(df.HeartRate)
Disease HeartRate State MonthStart MonthEnd DiffHeartRate
0 Covid 89 Texas 2020-02-28 2020-03-31 89.0
1 Covid 91 Texas 2020-03-31 2020-04-30 2.0
2 Covid 87 Texas 2020-07-31 2020-08-30 87.0
3 Cancer 90 Texas 2020-02-28 2020-03-31 90.0
4 Cancer 88 Florida 2020-03-31 2020-04-30 88.0
5 Covid 89 Florida 2020-02-28 2020-03-31 89.0
6 Covid 87 Florida 2020-03-31 2020-04-30 -2.0
7 Flu 90 Florida 2020-02-28 2020-03-31 90.0
Logic is same as the other answers but different way of representing.
Subtract previous row value from the current row value in a Pandas column
Use pandas.Series.diff
with fillna
:
import pandas as pd
s = pd.Series([11,15,22,27,36,69,77])
s.diff().fillna(s)
Output:
0 11.0
1 4.0
2 7.0
3 5.0
4 9.0
5 33.0
6 8.0
dtype: float64
How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?
you can use pct_change() or/and diff() methods
Demo:
In [138]: df.Close.pct_change() * 100
Out[138]:
0 NaN
1 0.469484
2 0.467290
3 -0.930233
4 0.469484
5 0.467290
6 0.000000
7 -3.255814
8 -3.365385
9 -0.497512
Name: Close, dtype: float64
In [139]: df.Close.diff()
Out[139]:
0 NaN
1 0.125
2 0.125
3 -0.250
4 0.125
5 0.125
6 0.000
7 -0.875
8 -0.875
9 -0.125
Name: Close, dtype: float64
How to subtract rows between two different dataframes and replace original value?
First solution is create index
in df22
by Bankname
for align by df1
for correct row subracting:
df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)
df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)
You need create new column to df2
with BankName
, convert BankName
to index
in both DataFrame
s, so possible subtract by this row:
df22 = df2.assign(BankName = 'Bank1').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 7.0 53.0
1 Bank2 15.0 65.0
2 Bank3 14.0 54.0
Subtract by Bank2
:
df22 = df2.assign(BankName = 'Bank2').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 10.0 55.0
1 Bank2 12.0 63.0
2 Bank3 14.0 54.0
Another solution with filter by BankName
:
m = df1['BankName']=='Bank1'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
print (df1)
BankName Value1 Value2
0 Bank1 7 53
1 Bank2 15 65
2 Bank3 14 54
m = df1['BankName']=='Bank2'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
Python Pandas Conditional Sum and subtract previous row
You can use .cumsum() to calculate a cumulative sum of the column:
df = pd.DataFrame({
'column1': [50, 100, 30, 0, 30, 80, 0],
'column2': [0, 0, 0, 10, 0, 0, 30],
})
df['column3'] = df['column1'].cumsum() - df['column2'].cumsum()
This results in:
column1 column2 column3
0 50 0 50
1 100 0 150
2 30 0 180
3 0 10 170
4 30 0 200
5 80 0 280
6 0 30 250
Related Topics
Redirect Command Line Results to a Tkinter Gui
How to Install a Module for All Users With Pip on Linux
How to Convert Strings With Billion or Million Abbreviation into Integers in a List
Python Json.Loads Valueerror, Expecting Delimiter
How to Select Last Row and Also How to Access Pyspark Dataframe by Index
How to Bold Text in Telepot Telegram Bot
How to Convert a 1 Channel Image into a 3 Channel With Opencv2
Possible to Loop Through Excel Files With Differently Named Sheets, and Import into a List
Sub Totals and Grand Totals in Python
How to Test If an Enum Member With a Certain Name Exists
How to Find the Average Colour of an Image in Python With Opencv
How to Sum Dictionaries Values With Same Key Inside a List
Deleting Rows from CSV Based on Cell Contents from Another Csv
How to Increase the Font Size of the Legend in My Seaborn Factorplot/Facetgrid