Calculate Time Difference Between Pandas Dataframe Indices

calculate time differences between consecutive rows using pandas?

Try this example:

import pandas as pd
import io

s = io.StringIO('''
dates,nums
2017-02-01T00:00:01,1
2017-02-01T00:00:01,2
2017-02-01T00:00:06,3
2017-02-01T00:00:07,4
2017-02-01T00:00:10,5
''')

df = pd.read_csv(s)

Currently the frame looks like this:

nums is nothing and just there to be a secondary column of "something".

                 dates  nums
0 2017-02-01T00:00:01 1
1 2017-02-01T00:00:01 2
2 2017-02-01T00:00:06 3
3 2017-02-01T00:00:07 4
4 2017-02-01T00:00:10 5

Carrying on:

# format as datetime
df['dates'] = pd.to_datetime(df['dates'])

# shift the dates up and into a new column
df['dates_shift'] = df['dates'].shift(-1)

# work out the diff
df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)

# remove the temp column
del df['dates_shift']

# see what you've got
print(df)

dates nums time_diff
0 2017-02-01 00:00:01 1 0.0
1 2017-02-01 00:00:01 2 5.0
2 2017-02-01 00:00:06 3 1.0
3 2017-02-01 00:00:07 4 3.0
4 2017-02-01 00:00:10 5 NaN

To get the absolute values change this line above:

df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)

To:

df['time_diff'] = (df['dates_shift'] - df['dates']).abs() / pd.Timedelta(seconds=1)

calculate the time difference between two consecutive rows in pandas

Problem is pandas need datetimes or timedeltas for diff function, so first converting by to_timedelta, then get total_seconds and divide by 60:

df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.966667
1 87556 13:20:33 15.550000
2 88955 13:05:00 49.533333
3 85678 12:15:28 NaN

If want floor or round per minutes:

df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
.diff(-1)
.dt.floor('T')
.dt.total_seconds()
.div(60))
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.0
1 87556 13:20:33 15.0
2 88955 13:05:00 49.0
3 85678 12:15:28 NaN

Calculate Time Difference Between Two Pandas Columns in Hours and Minutes

Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

to yield,

0    58
1 3
2 8
dtype: float64

How to calculate the time difference in a data frame in python?

Assuming this is your dataset

data = {'date': ['2020/06/24', '2020/06/25', '2020/06/27', '2020/06/30'], 
'time': ['23:00:28', '09:10:55', '03:42:58','16:45:51']}
df = pd.DataFrame(data)
print(df)
date time
0 2020/06/24 23:00:28
1 2020/06/25 09:10:55
2 2020/06/27 03:42:58
3 2020/06/30 16:45:51

You can use pandas .diff after converting your data to proper datetime format using pd.to_datetime

df['date_time'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df['time_diff'] = df['date_time'].diff()
print(df)
date time date_time time_diff
0 2020/06/24 23:00:28 2020-06-24 23:00:28 NaT
1 2020/06/25 09:10:55 2020-06-25 09:10:55 0 days 10:10:27
2 2020/06/27 03:42:58 2020-06-27 03:42:58 1 days 18:32:03
3 2020/06/30 16:45:51 2020-06-30 16:45:51 3 days 13:02:53

Measure the time difference of dataframe in accordance to index

instead of sorting, you can just take max-min for each index group;

# set index
df = df.set_index(df['Index'])

# make sure you have datetime dtype
df['Time'] = pd.to_datetime(df['Time'])

# group by index
grouped = df.groupby(df.index)
# ... and take max-min
ptp = (grouped['Time'].max()-grouped['Time'].min()).dt.total_seconds()/60
ptp
Out[29]:
Index
1 300.0
3 88.0
Name: Time, dtype: float64

note that I have modified the sample data slightly, so that the propagation of Index is visible:

Index Time 
1 2020-03-30T13:00:00
1 2020-03-30T14:00:00
1 2020-03-30T15:55:00
1 2020-03-30T18:00:00
3 2020-04-03T09:00:00
3 2020-04-03T09:50:00
3 2020-04-03T10:28:00

Python pandas date difference between rows

from datetime import datetime

import time

t1 = '09/06/2020 13:30:11.359497'

t2 = '10/06/2020 09:30:12.352452'

# convert t1, t2 to type datetime

date_time_t1 = datetime.strptime(t1, '%d/%m/%Y %H:%M:%S.%f')

date_time_t2 = datetime.strptime(t2, '%d/%m/%Y %H:%M:%S.%f')

# convert date_time_t1, date_time_t2 to Unix timestamp
timestamp_1 = time.mktime(date_time_t1.timetuple())

timestamp_2 = time.mktime(date_time_t2.timetuple())

# the difference in minutes
print(int(timestamp_2 - timestamp_1) / 60)

calculate time difference pandas dataframe

Parsing your date was non-trivial, I think strptime could prob do it, but didn't work for me. Your example above your times are just strings, not datetimes.

In [140]: from dateutil import parser

In [130]: def parse(x):
.....: date, hh, mm, ss = x.split(':')
.....: dd, mo, yyyy = date.split('/')
.....: return parser.parse("%s %s %s %s:%s:%s" % (yyyy,mo,dd,hh,mm,ss))
.....:

In [131]: map(parse,idx)
Out[131]:
[datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 16, 23, 56, 42),
datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45)]

In [132]: pd.to_datetime(map(parse,idx))
Out[132]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
Length: 6, Freq: None, Timezone: None

In [133]: df = DataFrame(dict(time = pd.to_datetime(map(parse,idx))))

In [134]: df
Out[134]:
time
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45

In [138]: df['delta'] = (df['time']-df['time'].shift()).fillna(0)

In [139]: df
Out[139]:
time delta
0 2013-05-16 23:56:43 00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43 00:00:01
3 2013-05-17 23:54:45 23:58:02
4 2013-05-17 23:54:45 00:00:00
5 2013-05-17 23:54:45 00:00:00

How to calculate time difference between specific row values in dataframe using python?

Sample:

times = [
'2019-05-18 01:15:28',
'2019-05-18 01:28:11',
'2019-05-18 01:36:36',
'2019-05-18 01:39:47',
'2019-05-18 01:53:32',
'2019-05-18 02:05:37'
]

a = [7, 7, 12, 7, 12, 7]

df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
times A
0 2019-05-18 01:15:28 7
1 2019-05-18 01:28:11 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7

First create default index and filter rows with 7 and 12 only:

df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]

Then get first consecutive values in rows with compare with shifted values:

df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7

Then filter 7 with next 12 rows:

m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)

df2 = df1[m1 | m2]
print (df2)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12

Get datetimes with pair and unpairs rows:

out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]

And last subtract:

df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
times A Time_difference
0 2019-05-18 01:15:28 7 00:00:00
1 2019-05-18 01:28:11 7 00:00:00
2 2019-05-18 01:36:36 12 00:21:08
3 2019-05-18 01:39:47 7 00:00:00
4 2019-05-18 01:53:32 12 00:13:45
5 2019-05-18 02:05:37 7 00:00:00


Related Topics



Leave a reply



Submit