Python - Use Previous Row'S Value to Update the New Rows Values

python - use previous row's value to update the new rows values

So you can do this used apply and nested functions

import pandas as pd
ID = [2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2002222,2002222,2002222,2002222,2002222,2002222,2002222,2002222,]
Date = ["10/30/2017","10/29/2017","10/28/2017","10/27/2017","10/26/2017","10/25/2017","10/24/2017","10/23/2017","10/22/2017","10/21/2017","10/21/2017","10/20/2017","10/19/2017","10/18/2017","10/17/2017","10/16/2017","10/15/2017","10/14/2017",]
current = [1 ,0 ,0 ,40,39,0 ,0 ,60,0 ,0 ,0 ,0 ,16,0 ,0 ,20,19,18,]

df = pd.DataFrame({"ID": ID, "Date": Date, "current": current})

Then create the function to update the frame

Python 3.X

def update_frame(df):
    last_expected = None
    def apply_logic(row):
        nonlocal last_expected
        last_row_id = row.name - 1
        if row.name == 0:
            last_expected = row["current"]
            return last_expected
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        last_expected = max(last_expected-1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return last_expected
    return apply_logic

Python 2.X

def update_frame(df):
    sd = {"last_expected": None}
    def apply_logic(row):
        last_row_id = row.name - 1
        if row.name == 0:
            sd['last_expected'] = row["current"]
            return sd['last_expected']
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        sd['last_expected'] = max(sd['last_expected'] - 1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return sd['last_expected']
    return apply_logic

And run the function like below

df['expected'] = df.apply(update_frame(df), axis=1)

The output is as expected

Output

How to update pandas DataFrame based on the previous row information

To help you understand how the shift(-1) works, please review the below solution. I looked at the image and created the raw DataFrame.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
                            '2021-02-04 21:00:00','2021-02-04 22:00:00',
                            '2021-02-04 23:00:00','2021-02-05 00:00:00',
                            '2021-02-05 01:00:00','2021-02-05 02:00:00'],
                   'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
                            1.19692,1.19662,1.19542],
                   'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
                            1.19721,1.19694,1.19682],
                   'Low'  :[1.19577,1.19637,1.19604,1.19590,1.19632,
                            1.19634,1.19622,1.19537],
                   'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
                            1.19662,1.19690,1.19665],
                   'Status':['ok']*8,
                   'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
                   'HH'   :[np.NaN]*5+[1.19721]+[np.NaN]*2,
                   'LL'   :[np.NaN]*8})
print (df)

#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']

#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)

print (df[['Dates','High','HH','NewHigh']])

The output of this will be:

                 Dates     High       HH  NewHigh
0  2021-02-04 19:00:00  1.19679      NaN  1.19679
1  2021-02-04 20:00:00  1.19678      NaN  1.19678
2  2021-02-04 21:00:00  1.19680      NaN  1.19680
3  2021-02-04 22:00:00  1.19679      NaN  1.19679
4  2021-02-04 23:00:00  1.19688      NaN  1.19721 # <- This got updated
5  2021-02-05 00:00:00  1.19721  1.19721  1.19721
6  2021-02-05 01:00:00  1.19694      NaN  1.19694
7  2021-02-05 02:00:00  1.19682      NaN  1.19682

Note: I created a new column to show you the changes. You can directly update High. Instead of 'NewHigh' on the df.loc line, you can give 'High'. That should do the trick.

Update row value based on the most recent value of the previous row

You can try replace and ffill here , then just compare if the ffilled value is 'list'

s = df['PageName'].replace('photo',np.nan).ffill().eq('list')|df['OfInterest']
df['OfInterest'] = s

print(df)

   RowNum PageName  OfInterest
0       0     home       False
1       1    photo       False
2       2     list        True
3       3    photo        True
4       4    photo        True
5       5    photo        True
6       6     home       False
7       7    photo       False

Pandas Dataframe update the row values by previous one based on condition

Below is what i came up with:(I have added 3 extra rows with IMEI : 55674 just for testing)

Removing consecutive 0s with a group of 3 (which needs no action) and slicing on the dataframe:

import itertools
def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

a = np.array(df[df.KVA == 0.00].index)
l = consecutive(a)
to_exclude=list(itertools.chain.from_iterable([i.tolist() for i in l if len(i)==3]))
pd.options.mode.chained_assignment = None
df1 = df.loc[~df.index.isin(to_exclude)]
>>df1
    IMEI    KVA     KwH
0   55647   1307.65 1020.33
1   55468   2988.00 1109.05
5   55469   1888.97 933.48
6   55647   1338.65 1120.33
7   55468   2088.00 1019.05
8   55647   0.00    977.87
9   55469   1455.28 1388.25
10  55648   2144.38 445.37
11  55469   1888.97 933.48
12  55674   0.00    6433.00
13  55674   1345.00 6542.00
14  55674   3456.00 6541.00

Assigning the leftover 0s with np.nan and doing a groupby with transform and fillna with the mean

df1['KVA'] = df1['KVA'].replace(0, np.nan)
df1['KVA'] = df1['KVA'].fillna(df1.fillna(0).groupby(['IMEI'])['KVA'].transform('mean'))
>>df1
    IMEI    KVA          KwH
0   55647   1307.650000 1020.33
1   55468   2988.000000 1109.05
5   55469   1888.970000 933.48
6   55647   1338.650000 1120.33
7   55468   2088.000000 1019.05
8   55647   882.100000  977.87
9   55469   1455.280000 1388.25
10  55648   2144.380000 445.37
11  55469   1888.970000 933.48
12  55674   1600.333333 6433.00
13  55674   1345.000000 6542.00
14  55674   3456.000000 6541.00

Then just concat and sort_index those which we had left out earlier:

pd.concat([df1,df.loc[df.index.isin(to_exclude)]]).sort_index()

    IMEI    KVA         KwH
0   55647   1307.650000 1020.33
1   55468   2988.000000 1109.05
2   55647   0.000000    977.87
3   55467   0.000000    1388.25
4   55647   0.000000    445.37
5   55469   1888.970000 933.48
6   55647   1338.650000 1120.33
7   55468   2088.000000 1019.05
8   55647   882.100000  977.87
9   55469   1455.280000 1388.25
10  55648   2144.380000 445.37
11  55469   1888.970000 933.48
12  55674   1600.333333 6433.00
13  55674   1345.000000 6542.00
14  55674   3456.000000 6541.00

Update pandas dataframe current row attribute based on its value in the previous row for each row

The problem you have is, that you want to calculate an array and the elements are dependent on each other. So, e.g., element 2 depends on elemen 1 in your array. Element 3 depends on element 2, and so on.

If there is a simple solution, depends on the formula you use, i.e., if you can vectorize it. Here is a good explanation on that topic: Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?

In your case a simple loop should do it:

balance = np.empty(len(df.index))
balance[0] = 100
for i in range(1, len(df.index)):
  balance[i] = balance[i-1] + 1  # or whatever formula you want to use

Please note, that above is the general solution. Your formula can be vectorized, thus also be generated using:

balance = 100 + np.arange(0, len(df.index))

In Pandas, how do I update the previous row in a iterator?

Without example data, it's unclear what you're trying. But using the operations in your for loop, it could probably be done like this instead, without any loop:

myValue = df['myCol']  # the column you wanted and other calculations
df['myCol'] = df['myCol'].shift() - myValue

Depending on what you're trying, one of these should be what you want:

# starting with this df
   myCol  otherCol
0      2         6
1      9         3
2      4         8
3      2         8
4      1         7

# next row minus current row
df['myCol'] = df['myCol'].shift(-1) - df['myCol']
df
# result:
   myCol  otherCol
0    7.0         6
1   -5.0         3
2   -2.0         8
3   -1.0         8
4    NaN         7

# previous row minus current row
df['myCol'] = df['myCol'].shift() - df['myCol']
df
# result:
   myCol  otherCol
0    NaN         6
1   -7.0         3
2    5.0         8
3    2.0         8
4    1.0         7

And myVal can be anything, like some mathematical operations vectorised over an entire column:

myVal = df['myCol'] * 2 + 3
# myVal is:
0     7
1    21
2    11
3     7
4     5
Name: myCol, dtype: int32

df['myCol'] = df['myCol'].shift(-1) - myVal
df
   myCol  otherCol
0    2.0         6
1  -17.0         3
2   -9.0         8
3   -6.0         8
4    NaN         7

Python - Use Previous Row'S Value to Update the New Rows Values