Pandas: Starting from the Second Row. Subtract from Previous Row and Use It as Value to the Next Subtraction

Pandas: Starting from the second row. subtract from previous row and use it as value to the next subtraction

Numpy, cumsum with alternating sign

i = np.arange(len(df))
j = np.arange(2)

a = np.where(
(i[:, None] + j) % 2 == 0, 1, -1
) * df.VALUE.values[:, None]

b = a.cumsum(0)[i, i % 2]

df.assign(VALUE=b)

ID VALUE
0 0 1
1 1 9
2 2 21
3 3 24
4 4 54

Explanation

First thing is to notice that

X0 ->                     X0
X1 -> X1 - X0
X2 -> X2 - X1 + X0
X3 -> X3 - X2 + X1 - X0
X4 -> X4 - X3 + X2 - X1 + X0

So I wanted to multiply every other row by negative one... but I needed to do this twice for the other choice of alternating rows.

I needed to generate a mask that swaps between + and - 1 for both options

i = np.arange(len(df))
j = np.arange(2)

m = np.where(
(i[:, None] + j) % 2 == 0, 1, -1
)

m

array([[ 1, -1],
[-1, 1],
[ 1, -1],
[-1, 1],
[ 1, -1]])

Now I need to broadcast multiply this across my df.VALUE

a = m * df.VALUE.values[:, None]

a

array([[ 1, -1],
[-10, 10],
[ 30, -30],
[-45, 45],
[ 78, -78]])

Notice the pattern. Now I cumsum

a.cumsum(0)

array([[ 1, -1],
[ -9, 9],
[ 21, -21],
[-24, 24],
[ 54, -54]])

But I need the positive ones... more specifically, I need the alternating ones. So I slice with a modded arange

b = a.cumsum(0)[i, i % 2]
b

array([ 1, 9, 21, 24, 54])

This is what I ended up assigning to the existing column

df.assign(VALUE=b)

ID VALUE
0 0 1
1 1 9
2 2 21
3 3 24
4 4 54

This produces a copy of df and overwrites the VALUE column with b.

To persist this answer, make sure to reassign to a new name or df if you want.

df_new = df.assign(VALUE=b)

pandas subtracting value in another column from previous row

Here is one potential way to do this.

First create a boolean mask, then use numpy.where and Series.shift to create the column date_difference:

mask = df.duplicated(['identifier', 'id_number'])

df['date_difference'] = (np.where(mask, (df['contract_year_month'] -
df['collection_year_month'].shift(1)).dt.days, np.nan))

[output]

    identifier  id_number   contract_year_month collection_year_month   date_difference
0 K001 1 2018-01-03 2018-01-09 NaN
1 K001 1 2018-01-08 2018-01-10 -1.0
2 K001 2 2018-01-01 2018-01-05 NaN
3 K001 2 2018-01-15 2018-01-18 10.0
4 K002 4 2018-01-04 2018-01-07 NaN
5 K002 4 2018-01-09 2018-01-15 2.0

Conditional shift: Subtract 'previous row value' from 'current row value' with multiple conditions in pandas

You may try something like this:

df['DiffHeartRate']=(df.groupby(['Disease', 'State', 
(df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate']
.apply(lambda x: x.diff())).fillna(df.HeartRate)


    Disease HeartRate   State   MonthStart  MonthEnd    DiffHeartRate
0 Covid 89 Texas 2020-02-28 2020-03-31 89.0
1 Covid 91 Texas 2020-03-31 2020-04-30 2.0
2 Covid 87 Texas 2020-07-31 2020-08-30 87.0
3 Cancer 90 Texas 2020-02-28 2020-03-31 90.0
4 Cancer 88 Florida 2020-03-31 2020-04-30 88.0
5 Covid 89 Florida 2020-02-28 2020-03-31 89.0
6 Covid 87 Florida 2020-03-31 2020-04-30 -2.0
7 Flu 90 Florida 2020-02-28 2020-03-31 90.0

Logic is same as the other answers but different way of representing.

Subtract previous row value from the current row value in a Pandas column

Use pandas.Series.diff with fillna:

import pandas as pd

s = pd.Series([11,15,22,27,36,69,77])
s.diff().fillna(s)

Output:

0    11.0
1 4.0
2 7.0
3 5.0
4 9.0
5 33.0
6 8.0
dtype: float64

How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?

you can use pct_change() or/and diff() methods

Demo:

In [138]: df.Close.pct_change() * 100
Out[138]:
0 NaN
1 0.469484
2 0.467290
3 -0.930233
4 0.469484
5 0.467290
6 0.000000
7 -3.255814
8 -3.365385
9 -0.497512
Name: Close, dtype: float64

In [139]: df.Close.diff()
Out[139]:
0 NaN
1 0.125
2 0.125
3 -0.250
4 0.125
5 0.125
6 0.000
7 -0.875
8 -0.875
9 -0.125
Name: Close, dtype: float64

How to subtract rows between two different dataframes and replace original value?

First solution is create index in df22 by Bankname for align by df1 for correct row subracting:

df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)

df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)

You need create new column to df2 with BankName, convert BankName to index in both DataFrames, so possible subtract by this row:

df22 = df2.assign(BankName = 'Bank1').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 7.0 53.0
1 Bank2 15.0 65.0
2 Bank3 14.0 54.0

Subtract by Bank2:

df22 = df2.assign(BankName = 'Bank2').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)

BankName Value1 Value2
0 Bank1 10.0 55.0
1 Bank2 12.0 63.0
2 Bank3 14.0 54.0

Another solution with filter by BankName:

m = df1['BankName']=='Bank1'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
print (df1)
BankName Value1 Value2
0 Bank1 7 53
1 Bank2 15 65
2 Bank3 14 54

m = df1['BankName']=='Bank2'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])

Python Pandas Conditional Sum and subtract previous row

You can use .cumsum() to calculate a cumulative sum of the column:

df = pd.DataFrame({
'column1': [50, 100, 30, 0, 30, 80, 0],
'column2': [0, 0, 0, 10, 0, 0, 30],
})

df['column3'] = df['column1'].cumsum() - df['column2'].cumsum()

This results in:

    column1 column2 column3
0 50 0 50
1 100 0 150
2 30 0 180
3 0 10 170
4 30 0 200
5 80 0 280
6 0 30 250


Related Topics



Leave a reply



Submit