Calculate the slope at every point in time series
I assume we start with:
>>> df
dt prod
0 2013-01-01 00:00:00 4732.154785
1 2013-01-01 00:15:00 4709.465820
2 2013-01-01 00:30:00 4646.984863
3 2013-01-01 00:45:00 4569.866211
4 2013-01-01 01:00:00 4559.160156
5 2013-01-01 01:15:00 4467.170898
6 2013-01-01 01:30:00 4413.409180
7 2013-01-01 01:45:00 4316.044922
8 2013-01-01 02:00:00 4279.421875
>>> df.dtypes
dt datetime64[ns]
prod float64
dtype: object
In case if your data is not sorted - start with:
df=df.sort_values("dt", ascending=1)
Then to get prev/next elements, as per "dt":
>>> df["prod_prev"] = df["prod"].shift(1)
>>> df["prod_next"] = df["prod"].shift(-1)
>>> df
dt prod prod_prev prod_next
0 2013-01-01 00:00:00 4732.154785 NaN 4709.465820
1 2013-01-01 00:15:00 4709.465820 4732.154785 4646.984863
2 2013-01-01 00:30:00 4646.984863 4709.465820 4569.866211
3 2013-01-01 00:45:00 4569.866211 4646.984863 4559.160156
4 2013-01-01 01:00:00 4559.160156 4569.866211 4467.170898
5 2013-01-01 01:15:00 4467.170898 4559.160156 4413.409180
6 2013-01-01 01:30:00 4413.409180 4467.170898 4316.044922
7 2013-01-01 01:45:00 4316.044922 4413.409180 4279.421875
8 2013-01-01 02:00:00 4279.421875 4316.044922 NaN
calculate slope in dataframe
To "calculate the slope at each point in the data," the simplest is to compute "rise over run" for each adjacent row using Series.diff()
as follows. The resulting Series gives (an estimate of) the instantaneous rate of change (IROC) between the previous and current row.
iroc = original[app].diff() / original['date'].diff()
Also, you don't need apply
. Thanks to numpy vectorization, scalar - array
behaves as expected:
delta = slope - iroc
Hope this works. As Wen-Ben commented, it would really help to see actual data and your expected output.
Pandas - Rolling slope calculation
It seems that what you want is rolling with a specific step size.
However, according to the documentation of pandas, step size is currently not supported in rolling
.
If the data size is not too large, just perform rolling on all data and select the results using indexing.
Here's a sample dataset. For simplicity, the time column is represented using integers.
data = pd.DataFrame(np.random.rand(500, 1) * 10, columns=['a'])
a
0 8.714074
1 0.985467
2 9.101299
3 4.598044
4 4.193559
.. ...
495 9.736984
496 2.447377
497 5.209420
498 2.698441
499 3.438271
Then, roll and calculate slopes,
def calc_slope(x):
slope = np.polyfit(range(len(x)), x, 1)[0]
return slope
# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
result = data.rolling(60, min_periods=2).apply(calc_slope)[4::5]
The result will be,
a
4 -0.542845
9 0.084953
14 0.155297
19 -0.048813
24 -0.011947
.. ...
479 -0.004792
484 -0.003714
489 0.022448
494 0.037301
499 0.027189
Or, you can refer to this post. The first answer provides a numpy way to achieve this:
step size in pandas.DataFrame.rolling
Fit a line with groupby in a pandas time series and get the slope
Do you want something like this?
(foo_so.groupby('id')
.apply(lambda x: 'Slope: %.3f' % np.polyfit(np.arange(len(x)),
x['y'],
deg=1)[0])
)
output:
id
a Slope: 2.190
b Slope: 0.410
Actually, if this is a time series, it makes probably more sense to use time as x:
(foo_so.groupby('id')
.apply(lambda x: 'Slope: %.3f' % np.polyfit(x['time'],
x['y'],
deg=1)[0])
)
output:
id
a Slope: 0.654
b Slope: 0.410
dtype: object
And if you only want to print:
for name, group in foo_so.groupby('id'):
print(f'Slope for {name}: %.3f' %
np.polyfit(np.arange(len(group)),group['y'], deg=1)[0]
)
output:
Slope for a: 2.190
Slope for b: 0.410
Linear Regression of Time-Series Data
This piece of code should give you the idea:
df = df.astype(float)
df.index = pd.to_datetime(df.index)
slopes = []
for col in df:
x = df.index.month.values
y = df[col].values
b = (len(x) * (x * y).sum() - (x.sum() * y.sum())) / (len(x) * (x ** 2).sum() - x.sum() ** 2)
slopes.append(b)
Slopes:
[-5.565429999999997,
0.40302000000000004,
-2.5999877,
-3.1999877,
-4.699987700000003]
The equations for linear regression are:
source
or with numpy.polyfit
df = df.astype(float)
df.index = pd.to_datetime(df.index)
x = df.index.month.values
y = df.values
slopes, offsets = np.polyfit(x, y, deg=1)
Slopes: array([-5.56543 , 0.40302 , -2.5999877, -3.1999877, -4.6999877])
Related Topics
Regex Check If Specific Multiple Words Present in a Sentence
What Do Numbers Starting With 0 Mean in Python
How to Read Numbers from File in Python
How to Check If Numbers Are in a List in Python
Python Json.Loads Shows Valueerror: Extra Data
Python: Editing List While Iterating Over It
How to Normalize a Numpy Array to Within a Certain Range
How to Plot Pandas Dataframe With Date (Year/Month)
Converting Pandas Column of Comma-Separated Strings into Integers
How to Specify File Path in Jupyter Notebook
Fast Way to Split Column into Multiple Rows in Pandas
How to Count the Total Number of Words in a Pandas Dataframe Cell and Add Those to a New Column
Pandas Dataframe Calculations With Previous Row
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
How to Check If a String Column in Pyspark Dataframe Is All Numeric
Pyspark: How to Duplicate a Row N Time in Dataframe
Pandas: How to Assign Values Based on Multiple Conditions for Existing Columns