Calculate Time Difference Between Pandas Dataframe Indices

calculate time differences between consecutive rows using pandas?

Try this example:

import pandas as pd
import io

s = io.StringIO('''
dates,nums
2017-02-01T00:00:01,1
2017-02-01T00:00:01,2
2017-02-01T00:00:06,3
2017-02-01T00:00:07,4
2017-02-01T00:00:10,5
''')

df = pd.read_csv(s)

Currently the frame looks like this:

nums is nothing and just there to be a secondary column of "something".

                 dates  nums
0  2017-02-01T00:00:01     1
1  2017-02-01T00:00:01     2
2  2017-02-01T00:00:06     3
3  2017-02-01T00:00:07     4
4  2017-02-01T00:00:10     5

Carrying on:

# format as datetime
df['dates'] = pd.to_datetime(df['dates'])

# shift the dates up and into a new column
df['dates_shift'] = df['dates'].shift(-1)

# work out the diff
df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)

# remove the temp column
del df['dates_shift']

# see what you've got
print(df)

                dates  nums  time_diff
0 2017-02-01 00:00:01     1        0.0
1 2017-02-01 00:00:01     2        5.0
2 2017-02-01 00:00:06     3        1.0
3 2017-02-01 00:00:07     4        3.0
4 2017-02-01 00:00:10     5        NaN

To get the absolute values change this line above:

df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)

To:

df['time_diff'] = (df['dates_shift'] - df['dates']).abs() / pd.Timedelta(seconds=1)

calculate the time difference between two consecutive rows in pandas

Problem is pandas need datetimes or timedeltas for diff function, so first converting by to_timedelta, then get total_seconds and divide by 60:

df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31  19.966667
1   87556  13:20:33  15.550000
2   88955  13:05:00  49.533333
3   85678  12:15:28        NaN

If want floor or round per minutes:

df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
                     .diff(-1)
                     .dt.floor('T')
                     .dt.total_seconds()
                     .div(60))
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31       19.0
1   87556  13:20:33       15.0
2   88955  13:05:00       49.0
3   85678  12:15:28        NaN

Calculate Time Difference Between Two Pandas Columns in Hours and Minutes

Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

to yield,

0    58
1     3
2     8
dtype: float64

How to calculate the time difference in a data frame in python?

Assuming this is your dataset

data = {'date': ['2020/06/24', '2020/06/25', '2020/06/27', '2020/06/30'], 
         'time': ['23:00:28', '09:10:55', '03:42:58','16:45:51']}
df = pd.DataFrame(data)
print(df)
         date      time
0  2020/06/24  23:00:28
1  2020/06/25  09:10:55
2  2020/06/27  03:42:58
3  2020/06/30  16:45:51

You can use pandas .diff after converting your data to proper datetime format using pd.to_datetime

df['date_time'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df['time_diff'] = df['date_time'].diff()
print(df)
         date      time           date_time       time_diff
0  2020/06/24  23:00:28 2020-06-24 23:00:28             NaT
1  2020/06/25  09:10:55 2020-06-25 09:10:55 0 days 10:10:27
2  2020/06/27  03:42:58 2020-06-27 03:42:58 1 days 18:32:03
3  2020/06/30  16:45:51 2020-06-30 16:45:51 3 days 13:02:53

Measure the time difference of dataframe in accordance to index

instead of sorting, you can just take max-min for each index group;

# set index
df = df.set_index(df['Index'])

# make sure you have datetime dtype
df['Time'] = pd.to_datetime(df['Time'])

# group by index
grouped = df.groupby(df.index)
# ... and take max-min
ptp = (grouped['Time'].max()-grouped['Time'].min()).dt.total_seconds()/60

ptp
Out[29]: 
Index
1    300.0
3     88.0
Name: Time, dtype: float64

note that I have modified the sample data slightly, so that the propagation of Index is visible:

Index Time 
1 2020-03-30T13:00:00 
1 2020-03-30T14:00:00 
1 2020-03-30T15:55:00 
1 2020-03-30T18:00:00 
3 2020-04-03T09:00:00 
3 2020-04-03T09:50:00 
3 2020-04-03T10:28:00

Python pandas date difference between rows

from datetime import datetime

import time

t1 = '09/06/2020 13:30:11.359497'

t2 = '10/06/2020 09:30:12.352452'

# convert t1, t2 to type datetime

date_time_t1 = datetime.strptime(t1, '%d/%m/%Y %H:%M:%S.%f')

date_time_t2 = datetime.strptime(t2, '%d/%m/%Y %H:%M:%S.%f')

# convert date_time_t1, date_time_t2 to Unix timestamp
timestamp_1 = time.mktime(date_time_t1.timetuple())

timestamp_2 = time.mktime(date_time_t2.timetuple())

# the difference in minutes
print(int(timestamp_2 - timestamp_1) / 60)

calculate time difference pandas dataframe

Parsing your date was non-trivial, I think strptime could prob do it, but didn't work for me. Your example above your times are just strings, not datetimes.

In [140]: from dateutil import parser

In [130]: def parse(x):
   .....:     date, hh, mm, ss = x.split(':')
   .....:     dd, mo, yyyy = date.split('/')
   .....:     return parser.parse("%s %s %s %s:%s:%s" % (yyyy,mo,dd,hh,mm,ss))
   .....: 

In [131]: map(parse,idx)
Out[131]: 
[datetime.datetime(2013, 5, 16, 23, 56, 43),
 datetime.datetime(2013, 5, 16, 23, 56, 42),
 datetime.datetime(2013, 5, 16, 23, 56, 43),
 datetime.datetime(2013, 5, 17, 23, 54, 45),
 datetime.datetime(2013, 5, 17, 23, 54, 45),
 datetime.datetime(2013, 5, 17, 23, 54, 45)]

In [132]: pd.to_datetime(map(parse,idx))
Out[132]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
Length: 6, Freq: None, Timezone: None

In [133]: df = DataFrame(dict(time = pd.to_datetime(map(parse,idx))))

In [134]: df
Out[134]: 
                 time
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45

In [138]: df['delta'] = (df['time']-df['time'].shift()).fillna(0)

In [139]: df
Out[139]: 
                 time     delta
0 2013-05-16 23:56:43  00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43  00:00:01
3 2013-05-17 23:54:45  23:58:02
4 2013-05-17 23:54:45  00:00:00
5 2013-05-17 23:54:45  00:00:00

How to calculate time difference between specific row values in dataframe using python?

Sample:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [7, 7, 12, 7, 12, 7]

df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
                times   A
0 2019-05-18 01:15:28   7
1 2019-05-18 01:28:11   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

First create default index and filter rows with 7 and 12 only:

df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]

Then get first consecutive values in rows with compare with shifted values:

df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

Then filter 7 with next 12 rows:

m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)

df2 = df1[m1 | m2]
print (df2)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12

Get datetimes with pair and unpairs rows:

out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]

And last subtract:

df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
                times   A Time_difference
0 2019-05-18 01:15:28   7        00:00:00
1 2019-05-18 01:28:11   7        00:00:00
2 2019-05-18 01:36:36  12        00:21:08
3 2019-05-18 01:39:47   7        00:00:00
4 2019-05-18 01:53:32  12        00:13:45
5 2019-05-18 02:05:37   7        00:00:00