Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?
First, create the derived value:
df.loc[0, 'C'] = df.loc[0, 'D']
Then iterate through the remaining rows and fill the calculated values:
for i in range(1, len(df)):
df.loc[i, 'C'] = df.loc[i-1, 'C'] * df.loc[i, 'A'] + df.loc[i, 'B']
Index_Date A B C D
0 2015-01-31 10 10 10 10
1 2015-02-01 2 3 23 22
2 2015-02-02 10 60 290 280
Is there a way in Pandas to use previous row values in dataframe.apply where previous values are also calculated in the apply?
I wouldn't recommend to use apply
in this case.
Why not simply use two loops, for each differently defined range one:
for i in df.index[1:5]:
df.loc[i, 'Y'] = df.W.loc[i] + df.Y.loc[i-1]
for i in df.index[5:]:
df.loc[i, 'Y'] = df.W.loc[i] + df.Y.loc[i-1] + df.Y.loc[i-4] - df.Y.loc[i-5]
This is straight forward and you still know next week what the code does.
Use previous row value to calculate next value in Pandas MultiIndex DataFrame
IIUC, what you want is a cumprod
where you initialize the value to 100. The rest is just indexing:
START = 100
df[('foo', 'one')] = (df[('bar', 'one')]
.add(1)
.shift(fill_value=START)
.cumprod()
)
output:
first bar foo
second one two one
A 1.764052 0.400157 100.000000
B 0.978738 2.240893 276.405235
C 1.867558 -0.977278 546.933537
D 0.950088 -0.151357 1568.363633
indexing
Independently of your goal, to index a MultiIndex you would need to use:
df.loc['A', ('bar', 'one')]
or, for a mix of names and relative indexing:
df[('bar', 'one')].iloc[0]
is there a way in Pandas to use previous row value to compute new values for a row
You could try below for-loop
that sums up the column like using formula in Excel:
import pandas as pd
df = pd.DataFrame({
'Date': ['', '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16', '2020-01-17'],
'AAPL': ['', 0.021364, -0.013503, -0.004286, 0.012526, 0.011071]})
df['Portfolio'] = 1
for i in range(1, len(df)):
df.loc[i, 'Portfolio'] = df.loc[i-1, 'Portfolio'] * (1 + df.loc[i, 'AAPL']) + 3
print(df)
Output
Date AAPL Portfolio
0 1.000000
1 2020-01-13 0.021364 4.021364
2 2020-01-14 -0.013503 6.967064
3 2020-01-15 -0.004286 9.937203
4 2020-01-16 0.012526 13.061676
5 2020-01-17 0.011071 16.206282
Is there a way to use the previous calculated row value with the sum of a different column in a Pandas Dataframe?
We can define a function fast_sum
to perform the required calculation then using the technique called just in time compilation, compile this function to machine code so that it can run more efficiently at C
like speeds
import numba
@numba.jit(nopython=True)
def fast_sum(a):
b = np.zeros_like(a)
b[0] = a[0]
for i in range(1, len(a)):
b[i] = (b[i - 1] * 5 + a[i]) / 6
return b
df['B'] = fast_sum(df['A'].fillna(0).to_numpy())
A B
2021-05-19 07:00:00 0.00 0.000000
2021-05-19 07:30:00 0.00 0.000000
2021-05-19 08:00:00 0.00 0.000000
2021-05-19 08:30:00 0.00 0.000000
2021-05-19 09:00:00 19.91 3.318333
2021-05-19 09:30:00 0.11 2.783611
2021-05-19 10:00:00 0.00 2.319676
2021-05-19 10:30:00 22.99 5.764730
2021-05-19 11:00:00 0.00 4.803942
Performance test on sample dataframe with 90000
rows
df = pd.concat([df] * 10000, ignore_index=True)
%%timeit
df['B'] = fast_sum(df['A'].fillna(0).to_numpy())
# 1.62 ms ± 93.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Pandas dataframe : Applying function to row value and value from the previous row
You can use .apply()
on each row, as follows:
Here, .apply()
helps you pass the scalar values row by row to the custom function. Thus, enabling you to reuse your custom function which was designed to work on scalar values. Otherwise, you may need to modify your custom function to support vectorized array values of Pandas.
To cater for the .shift()
entries, one workaround will be to define new columns for them first so that we can pass them to the .apply()
function.
# Take previous entry by shift and `fillna` with original value for first row entry
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)
Related Topics
Pygame Already Installed; However, Python Terminal Says "No Module Named 'Pygame' " (Ubuntu 20.04.1)
How to Mock Requests and the Response
Converting Integer to Binary in Python
Difference Between Variables Inside and Outside of _Init_()
Python Requests. 403 Forbidden
How Is the 'Is' Keyword Implemented in Python
How to Postpone/Defer the Evaluation of F-Strings
When Does Python Allocate New Memory for Identical Strings
How to Define a Threshold Value to Detect Only Green Colour Objects in an Image with Python Opencv
"Getaddrinfo Failed", What Does That Mean
Caesar Cipher Function in Python
Parsing Date/Time String with Timezone Abbreviated Name in Python