Python - Use Previous Row'S Value to Update the New Rows Values

python - use previous row's value to update the new rows values

So you can do this used apply and nested functions

import pandas as pd
ID = [2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2002222,2002222,2002222,2002222,2002222,2002222,2002222,2002222,]
Date = ["10/30/2017","10/29/2017","10/28/2017","10/27/2017","10/26/2017","10/25/2017","10/24/2017","10/23/2017","10/22/2017","10/21/2017","10/21/2017","10/20/2017","10/19/2017","10/18/2017","10/17/2017","10/16/2017","10/15/2017","10/14/2017",]
current = [1 ,0 ,0 ,40,39,0 ,0 ,60,0 ,0 ,0 ,0 ,16,0 ,0 ,20,19,18,]

df = pd.DataFrame({"ID": ID, "Date": Date, "current": current})

Then create the function to update the frame

Python 3.X

def update_frame(df):
last_expected = None
def apply_logic(row):
nonlocal last_expected
last_row_id = row.name - 1
if row.name == 0:
last_expected = row["current"]
return last_expected
last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
last_expected = max(last_expected-1,row['current']) if last_row['ID'] == row['ID'] else row['current']
return last_expected
return apply_logic

Python 2.X

def update_frame(df):
sd = {"last_expected": None}
def apply_logic(row):
last_row_id = row.name - 1
if row.name == 0:
sd['last_expected'] = row["current"]
return sd['last_expected']
last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
sd['last_expected'] = max(sd['last_expected'] - 1,row['current']) if last_row['ID'] == row['ID'] else row['current']
return sd['last_expected']
return apply_logic

And run the function like below

df['expected'] = df.apply(update_frame(df), axis=1)

The output is as expected

Output

How to update pandas DataFrame based on the previous row information

To help you understand how the shift(-1) works, please review the below solution. I looked at the image and created the raw DataFrame.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
'2021-02-04 21:00:00','2021-02-04 22:00:00',
'2021-02-04 23:00:00','2021-02-05 00:00:00',
'2021-02-05 01:00:00','2021-02-05 02:00:00'],
'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
1.19692,1.19662,1.19542],
'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
1.19721,1.19694,1.19682],
'Low' :[1.19577,1.19637,1.19604,1.19590,1.19632,
1.19634,1.19622,1.19537],
'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
1.19662,1.19690,1.19665],
'Status':['ok']*8,
'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
'HH' :[np.NaN]*5+[1.19721]+[np.NaN]*2,
'LL' :[np.NaN]*8})
print (df)

#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']

#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)

print (df[['Dates','High','HH','NewHigh']])

The output of this will be:

                 Dates     High       HH  NewHigh
0 2021-02-04 19:00:00 1.19679 NaN 1.19679
1 2021-02-04 20:00:00 1.19678 NaN 1.19678
2 2021-02-04 21:00:00 1.19680 NaN 1.19680
3 2021-02-04 22:00:00 1.19679 NaN 1.19679
4 2021-02-04 23:00:00 1.19688 NaN 1.19721 # <- This got updated
5 2021-02-05 00:00:00 1.19721 1.19721 1.19721
6 2021-02-05 01:00:00 1.19694 NaN 1.19694
7 2021-02-05 02:00:00 1.19682 NaN 1.19682

Note: I created a new column to show you the changes. You can directly update High. Instead of 'NewHigh' on the df.loc line, you can give 'High'. That should do the trick.

Update row value based on the most recent value of the previous row

You can try replace and ffill here , then just compare if the ffilled value is 'list'

s = df['PageName'].replace('photo',np.nan).ffill().eq('list')|df['OfInterest']
df['OfInterest'] = s


print(df)

RowNum PageName OfInterest
0 0 home False
1 1 photo False
2 2 list True
3 3 photo True
4 4 photo True
5 5 photo True
6 6 home False
7 7 photo False

Pandas Dataframe update the row values by previous one based on condition

Below is what i came up with:(I have added 3 extra rows with IMEI : 55674 just for testing)

Removing consecutive 0s with a group of 3 (which needs no action) and slicing on the dataframe:

import itertools
def consecutive(data, stepsize=1):
return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

a = np.array(df[df.KVA == 0.00].index)
l = consecutive(a)
to_exclude=list(itertools.chain.from_iterable([i.tolist() for i in l if len(i)==3]))
pd.options.mode.chained_assignment = None
df1 = df.loc[~df.index.isin(to_exclude)]
>>df1
IMEI KVA KwH
0 55647 1307.65 1020.33
1 55468 2988.00 1109.05
5 55469 1888.97 933.48
6 55647 1338.65 1120.33
7 55468 2088.00 1019.05
8 55647 0.00 977.87
9 55469 1455.28 1388.25
10 55648 2144.38 445.37
11 55469 1888.97 933.48
12 55674 0.00 6433.00
13 55674 1345.00 6542.00
14 55674 3456.00 6541.00

Assigning the leftover 0s with np.nan and doing a groupby with transform and fillna with the mean

df1['KVA'] = df1['KVA'].replace(0, np.nan)
df1['KVA'] = df1['KVA'].fillna(df1.fillna(0).groupby(['IMEI'])['KVA'].transform('mean'))
>>df1
IMEI KVA KwH
0 55647 1307.650000 1020.33
1 55468 2988.000000 1109.05
5 55469 1888.970000 933.48
6 55647 1338.650000 1120.33
7 55468 2088.000000 1019.05
8 55647 882.100000 977.87
9 55469 1455.280000 1388.25
10 55648 2144.380000 445.37
11 55469 1888.970000 933.48
12 55674 1600.333333 6433.00
13 55674 1345.000000 6542.00
14 55674 3456.000000 6541.00

Then just concat and sort_index those which we had left out earlier:

pd.concat([df1,df.loc[df.index.isin(to_exclude)]]).sort_index()

IMEI KVA KwH
0 55647 1307.650000 1020.33
1 55468 2988.000000 1109.05
2 55647 0.000000 977.87
3 55467 0.000000 1388.25
4 55647 0.000000 445.37
5 55469 1888.970000 933.48
6 55647 1338.650000 1120.33
7 55468 2088.000000 1019.05
8 55647 882.100000 977.87
9 55469 1455.280000 1388.25
10 55648 2144.380000 445.37
11 55469 1888.970000 933.48
12 55674 1600.333333 6433.00
13 55674 1345.000000 6542.00
14 55674 3456.000000 6541.00

Update pandas dataframe current row attribute based on its value in the previous row for each row

The problem you have is, that you want to calculate an array and the elements are dependent on each other. So, e.g., element 2 depends on elemen 1 in your array. Element 3 depends on element 2, and so on.

If there is a simple solution, depends on the formula you use, i.e., if you can vectorize it. Here is a good explanation on that topic: Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?

In your case a simple loop should do it:

balance = np.empty(len(df.index))
balance[0] = 100
for i in range(1, len(df.index)):
balance[i] = balance[i-1] + 1 # or whatever formula you want to use

Please note, that above is the general solution. Your formula can be vectorized, thus also be generated using:

balance = 100 + np.arange(0, len(df.index))

In Pandas, how do I update the previous row in a iterator?

Without example data, it's unclear what you're trying. But using the operations in your for loop, it could probably be done like this instead, without any loop:

myValue = df['myCol']  # the column you wanted and other calculations
df['myCol'] = df['myCol'].shift() - myValue

Depending on what you're trying, one of these should be what you want:

# starting with this df
myCol otherCol
0 2 6
1 9 3
2 4 8
3 2 8
4 1 7

# next row minus current row
df['myCol'] = df['myCol'].shift(-1) - df['myCol']
df
# result:
myCol otherCol
0 7.0 6
1 -5.0 3
2 -2.0 8
3 -1.0 8
4 NaN 7

or

# previous row minus current row
df['myCol'] = df['myCol'].shift() - df['myCol']
df
# result:
myCol otherCol
0 NaN 6
1 -7.0 3
2 5.0 8
3 2.0 8
4 1.0 7

And myVal can be anything, like some mathematical operations vectorised over an entire column:

myVal = df['myCol'] * 2 + 3
# myVal is:
0 7
1 21
2 11
3 7
4 5
Name: myCol, dtype: int32

df['myCol'] = df['myCol'].shift(-1) - myVal
df
myCol otherCol
0 2.0 6
1 -17.0 3
2 -9.0 8
3 -6.0 8
4 NaN 7


Related Topics



Leave a reply



Submit