Is There a Way in Pandas to Use Previous Row Value in Dataframe.Apply When Previous Value Is Also Calculated in the Apply

Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

First, create the derived value:

df.loc[0, 'C'] = df.loc[0, 'D']

Then iterate through the remaining rows and fill the calculated values:

for i in range(1, len(df)):
df.loc[i, 'C'] = df.loc[i-1, 'C'] * df.loc[i, 'A'] + df.loc[i, 'B']

Index_Date A B C D
0 2015-01-31 10 10 10 10
1 2015-02-01 2 3 23 22
2 2015-02-02 10 60 290 280

Is there a way in Pandas to use previous row values in dataframe.apply where previous values are also calculated in the apply?

I wouldn't recommend to use apply in this case.

Why not simply use two loops, for each differently defined range one:

for i in df.index[1:5]:
df.loc[i, 'Y'] = df.W.loc[i] + df.Y.loc[i-1]
for i in df.index[5:]:
df.loc[i, 'Y'] = df.W.loc[i] + df.Y.loc[i-1] + df.Y.loc[i-4] - df.Y.loc[i-5]

This is straight forward and you still know next week what the code does.

Use previous row value to calculate next value in Pandas MultiIndex DataFrame

IIUC, what you want is a cumprod where you initialize the value to 100. The rest is just indexing:

START = 100
df[('foo', 'one')] = (df[('bar', 'one')]
.add(1)
.shift(fill_value=START)
.cumprod()
)

output:

first        bar                    foo
second one two one
A 1.764052 0.400157 100.000000
B 0.978738 2.240893 276.405235
C 1.867558 -0.977278 546.933537
D 0.950088 -0.151357 1568.363633
indexing

Independently of your goal, to index a MultiIndex you would need to use:

df.loc['A', ('bar', 'one')]

or, for a mix of names and relative indexing:

df[('bar', 'one')].iloc[0]

is there a way in Pandas to use previous row value to compute new values for a row

You could try below for-loop that sums up the column like using formula in Excel:

import pandas as pd

df = pd.DataFrame({
'Date': ['', '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16', '2020-01-17'],
'AAPL': ['', 0.021364, -0.013503, -0.004286, 0.012526, 0.011071]})
df['Portfolio'] = 1

for i in range(1, len(df)):
df.loc[i, 'Portfolio'] = df.loc[i-1, 'Portfolio'] * (1 + df.loc[i, 'AAPL']) + 3

print(df)

Output

         Date      AAPL  Portfolio
0 1.000000
1 2020-01-13 0.021364 4.021364
2 2020-01-14 -0.013503 6.967064
3 2020-01-15 -0.004286 9.937203
4 2020-01-16 0.012526 13.061676
5 2020-01-17 0.011071 16.206282

Is there a way to use the previous calculated row value with the sum of a different column in a Pandas Dataframe?

We can define a function fast_sum to perform the required calculation then using the technique called just in time compilation, compile this function to machine code so that it can run more efficiently at C like speeds

import numba

@numba.jit(nopython=True)
def fast_sum(a):
b = np.zeros_like(a)
b[0] = a[0]
for i in range(1, len(a)):
b[i] = (b[i - 1] * 5 + a[i]) / 6
return b

df['B'] = fast_sum(df['A'].fillna(0).to_numpy())


                         A         B
2021-05-19 07:00:00 0.00 0.000000
2021-05-19 07:30:00 0.00 0.000000
2021-05-19 08:00:00 0.00 0.000000
2021-05-19 08:30:00 0.00 0.000000
2021-05-19 09:00:00 19.91 3.318333
2021-05-19 09:30:00 0.11 2.783611
2021-05-19 10:00:00 0.00 2.319676
2021-05-19 10:30:00 22.99 5.764730
2021-05-19 11:00:00 0.00 4.803942

Performance test on sample dataframe with 90000 rows

df = pd.concat([df] * 10000, ignore_index=True)

%%timeit
df['B'] = fast_sum(df['A'].fillna(0).to_numpy())
# 1.62 ms ± 93.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Pandas dataframe : Applying function to row value and value from the previous row

You can use .apply() on each row, as follows:

Here, .apply() helps you pass the scalar values row by row to the custom function. Thus, enabling you to reuse your custom function which was designed to work on scalar values. Otherwise, you may need to modify your custom function to support vectorized array values of Pandas.

To cater for the .shift() entries, one workaround will be to define new columns for them first so that we can pass them to the .apply() function.

# Take previous entry by shift and `fillna` with original value for first row entry 
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])

df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)


Related Topics



Leave a reply



Submit