﻿ Pandas: Starting from the Second Row. Subtract from Previous Row and Use It as Value to the Next Subtraction - ITCodar

# Pandas: Starting from the Second Row. Subtract from Previous Row and Use It as Value to the Next Subtraction

## Pandas: Starting from the second row. subtract from previous row and use it as value to the next subtraction

### Numpy, `cumsum` with alternating sign

``i = np.arange(len(df))j = np.arange(2)a = np.where(    (i[:, None] + j) % 2 == 0, 1, -1) * df.VALUE.values[:, None]b = a.cumsum(0)[i, i % 2]df.assign(VALUE=b)   ID  VALUE0   0      11   1      92   2     213   3     244   4     54``

### Explanation

First thing is to notice that

``X0 ->                     X0X1 ->                X1 - X0X2 ->           X2 - X1 + X0X3 ->      X3 - X2 + X1 - X0X4 -> X4 - X3 + X2 - X1 + X0``

So I wanted to multiply every other row by negative one... but I needed to do this twice for the other choice of alternating rows.

I needed to generate a mask that swaps between + and - 1 for both options

``i = np.arange(len(df))j = np.arange(2)m = np.where(    (i[:, None] + j) % 2 == 0, 1, -1)marray([[ 1, -1],       [-1,  1],       [ 1, -1],       [-1,  1],       [ 1, -1]])``

Now I need to broadcast multiply this across my `df.VALUE`

``a = m * df.VALUE.values[:, None]aarray([[  1,  -1],       [-10,  10],       [ 30, -30],       [-45,  45],       [ 78, -78]])``

Notice the pattern. Now I `cumsum`

``a.cumsum(0)array([[  1,  -1],       [ -9,   9],       [ 21, -21],       [-24,  24],       [ 54, -54]])``

But I need the positive ones... more specifically, I need the alternating ones. So I slice with a modded `arange`

``b = a.cumsum(0)[i, i % 2]barray([ 1,  9, 21, 24, 54])``

This is what I ended up assigning to the existing column

``df.assign(VALUE=b)   ID  VALUE0   0      11   1      92   2     213   3     244   4     54``

This produces a copy of `df` and overwrites the `VALUE` column with `b`.

To persist this answer, make sure to reassign to a new name or `df` if you want.

``df_new = df.assign(VALUE=b)``

## pandas subtracting value in another column from previous row

Here is one potential way to do this.

First create a boolean mask, then use `numpy.where` and `Series.shift` to create the column date_difference:

``mask = df.duplicated(['identifier', 'id_number'])df['date_difference'] = (np.where(mask, (df['contract_year_month'] -                                          df['collection_year_month'].shift(1)).dt.days, np.nan))``

[output]

``    identifier  id_number   contract_year_month collection_year_month   date_difference0   K001    1   2018-01-03  2018-01-09  NaN1   K001    1   2018-01-08  2018-01-10  -1.02   K001    2   2018-01-01  2018-01-05  NaN3   K001    2   2018-01-15  2018-01-18  10.04   K002    4   2018-01-04  2018-01-07  NaN5   K002    4   2018-01-09  2018-01-15  2.0``

## Conditional shift: Subtract 'previous row value' from 'current row value' with multiple conditions in pandas

You may try something like this:

``df['DiffHeartRate']=(df.groupby(['Disease', 'State',           (df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate'] .apply(lambda x: x.diff())).fillna(df.HeartRate)``

``    Disease HeartRate   State   MonthStart  MonthEnd    DiffHeartRate0   Covid   89          Texas   2020-02-28  2020-03-31  89.01   Covid   91          Texas   2020-03-31  2020-04-30  2.02   Covid   87          Texas   2020-07-31  2020-08-30  87.03   Cancer  90          Texas   2020-02-28  2020-03-31  90.04   Cancer  88          Florida 2020-03-31  2020-04-30  88.05   Covid   89          Florida 2020-02-28  2020-03-31  89.06   Covid   87          Florida 2020-03-31  2020-04-30  -2.07   Flu     90          Florida 2020-02-28  2020-03-31  90.0``

Logic is same as the other answers but different way of representing.

## Subtract previous row value from the current row value in a Pandas column

Use `pandas.Series.diff` with `fillna`:

``import pandas as pds = pd.Series([11,15,22,27,36,69,77])s.diff().fillna(s)``

Output:

``0    11.01     4.02     7.03     5.04     9.05    33.06     8.0dtype: float64``

## How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?

you can use pct_change() or/and diff() methods

Demo:

``In [138]: df.Close.pct_change() * 100Out[138]:0         NaN1    0.4694842    0.4672903   -0.9302334    0.4694845    0.4672906    0.0000007   -3.2558148   -3.3653859   -0.497512Name: Close, dtype: float64In [139]: df.Close.diff()Out[139]:0      NaN1    0.1252    0.1253   -0.2504    0.1255    0.1256    0.0007   -0.8758   -0.8759   -0.125Name: Close, dtype: float64``

## How to subtract rows between two different dataframes and replace original value?

First solution is create `index` in `df22` by `Bankname` for align by `df1` for correct row subracting:

``df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)``

You need create new column to `df2` with `BankName`, convert `BankName` to `index` in both `DataFrame`s, so possible subtract by this row:

``df22 = df2.assign(BankName = 'Bank1').set_index('BankName')df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()print (df)  BankName  Value1  Value20    Bank1     7.0    53.01    Bank2    15.0    65.02    Bank3    14.0    54.0``

Subtract by `Bank2`:

``df22 = df2.assign(BankName = 'Bank2').set_index('BankName')df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()print (df)  BankName  Value1  Value20    Bank1    10.0    55.01    Bank2    12.0    63.02    Bank3    14.0    54.0``

Another solution with filter by `BankName`:

``m = df1['BankName']=='Bank1'df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])print (df1)  BankName  Value1  Value20    Bank1       7      531    Bank2      15      652    Bank3      14      54m = df1['BankName']=='Bank2'df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])``

## Python Pandas Conditional Sum and subtract previous row

You can use .cumsum() to calculate a cumulative sum of the column:

``df = pd.DataFrame({    'column1': [50, 100, 30, 0, 30, 80, 0],     'column2': [0, 0, 0, 10, 0, 0, 30],})df['column3'] = df['column1'].cumsum() - df['column2'].cumsum()``

This results in:

``    column1 column2 column30    50     0        501   100     0       1502    30     0       1803     0    10       1704    30     0       2005    80     0       2806     0    30       250``