How to Get Slope from Timeseries Data in Pandas

Calculate the slope at every point in time series

I assume we start with:

>>> df
dt prod
0 2013-01-01 00:00:00 4732.154785
1 2013-01-01 00:15:00 4709.465820
2 2013-01-01 00:30:00 4646.984863
3 2013-01-01 00:45:00 4569.866211
4 2013-01-01 01:00:00 4559.160156
5 2013-01-01 01:15:00 4467.170898
6 2013-01-01 01:30:00 4413.409180
7 2013-01-01 01:45:00 4316.044922
8 2013-01-01 02:00:00 4279.421875
>>> df.dtypes
dt datetime64[ns]
prod float64
dtype: object

In case if your data is not sorted - start with:

df=df.sort_values("dt", ascending=1)

Then to get prev/next elements, as per "dt":

>>> df["prod_prev"] = df["prod"].shift(1)
>>> df["prod_next"] = df["prod"].shift(-1)
>>> df
dt prod prod_prev prod_next
0 2013-01-01 00:00:00 4732.154785 NaN 4709.465820
1 2013-01-01 00:15:00 4709.465820 4732.154785 4646.984863
2 2013-01-01 00:30:00 4646.984863 4709.465820 4569.866211
3 2013-01-01 00:45:00 4569.866211 4646.984863 4559.160156
4 2013-01-01 01:00:00 4559.160156 4569.866211 4467.170898
5 2013-01-01 01:15:00 4467.170898 4559.160156 4413.409180
6 2013-01-01 01:30:00 4413.409180 4467.170898 4316.044922
7 2013-01-01 01:45:00 4316.044922 4413.409180 4279.421875
8 2013-01-01 02:00:00 4279.421875 4316.044922 NaN


calculate slope in dataframe

To "calculate the slope at each point in the data," the simplest is to compute "rise over run" for each adjacent row using Series.diff() as follows. The resulting Series gives (an estimate of) the instantaneous rate of change (IROC) between the previous and current row.

iroc = original[app].diff() / original['date'].diff()

Also, you don't need apply. Thanks to numpy vectorization, scalar - array behaves as expected:

delta = slope - iroc

Hope this works. As Wen-Ben commented, it would really help to see actual data and your expected output.

Pandas - Rolling slope calculation

It seems that what you want is rolling with a specific step size.
However, according to the documentation of pandas, step size is currently not supported in rolling.

If the data size is not too large, just perform rolling on all data and select the results using indexing.

Here's a sample dataset. For simplicity, the time column is represented using integers.

data = pd.DataFrame(np.random.rand(500, 1) * 10, columns=['a'])
            a
0 8.714074
1 0.985467
2 9.101299
3 4.598044
4 4.193559
.. ...
495 9.736984
496 2.447377
497 5.209420
498 2.698441
499 3.438271

Then, roll and calculate slopes,

def calc_slope(x):
slope = np.polyfit(range(len(x)), x, 1)[0]
return slope

# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
result = data.rolling(60, min_periods=2).apply(calc_slope)[4::5]

The result will be,

            a
4 -0.542845
9 0.084953
14 0.155297
19 -0.048813
24 -0.011947
.. ...
479 -0.004792
484 -0.003714
489 0.022448
494 0.037301
499 0.027189

Or, you can refer to this post. The first answer provides a numpy way to achieve this:
step size in pandas.DataFrame.rolling

Fit a line with groupby in a pandas time series and get the slope

Do you want something like this?

(foo_so.groupby('id')
.apply(lambda x: 'Slope: %.3f' % np.polyfit(np.arange(len(x)),
x['y'],
deg=1)[0])
)

output:

id
a Slope: 2.190
b Slope: 0.410

Actually, if this is a time series, it makes probably more sense to use time as x:

(foo_so.groupby('id')
.apply(lambda x: 'Slope: %.3f' % np.polyfit(x['time'],
x['y'],
deg=1)[0])
)

output:

id
a Slope: 0.654
b Slope: 0.410
dtype: object

And if you only want to print:

for name, group in foo_so.groupby('id'):
print(f'Slope for {name}: %.3f' %
np.polyfit(np.arange(len(group)),group['y'], deg=1)[0]
)

output:

Slope for a: 2.190
Slope for b: 0.410

Linear Regression of Time-Series Data

This piece of code should give you the idea:

df = df.astype(float)
df.index = pd.to_datetime(df.index)
slopes = []
for col in df:
x = df.index.month.values
y = df[col].values
b = (len(x) * (x * y).sum() - (x.sum() * y.sum())) / (len(x) * (x ** 2).sum() - x.sum() ** 2)
slopes.append(b)

Slopes:
[-5.565429999999997,
0.40302000000000004,
-2.5999877,
-3.1999877,
-4.699987700000003]

The equations for linear regression are:

enter image description here

source

or with numpy.polyfit

df = df.astype(float)
df.index = pd.to_datetime(df.index)
x = df.index.month.values
y = df.values
slopes, offsets = np.polyfit(x, y, deg=1)

Slopes: array([-5.56543 , 0.40302 , -2.5999877, -3.1999877, -4.6999877])



Related Topics



Leave a reply



Submit