calculate time differences between consecutive rows using pandas?
Try this example:
import pandas as pd
import io
s = io.StringIO('''
dates,nums
2017-02-01T00:00:01,1
2017-02-01T00:00:01,2
2017-02-01T00:00:06,3
2017-02-01T00:00:07,4
2017-02-01T00:00:10,5
''')
df = pd.read_csv(s)
Currently the frame looks like this:
nums
is nothing and just there to be a secondary column of "something".
dates nums
0 2017-02-01T00:00:01 1
1 2017-02-01T00:00:01 2
2 2017-02-01T00:00:06 3
3 2017-02-01T00:00:07 4
4 2017-02-01T00:00:10 5
Carrying on:
# format as datetime
df['dates'] = pd.to_datetime(df['dates'])
# shift the dates up and into a new column
df['dates_shift'] = df['dates'].shift(-1)
# work out the diff
df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)
# remove the temp column
del df['dates_shift']
# see what you've got
print(df)
dates nums time_diff
0 2017-02-01 00:00:01 1 0.0
1 2017-02-01 00:00:01 2 5.0
2 2017-02-01 00:00:06 3 1.0
3 2017-02-01 00:00:07 4 3.0
4 2017-02-01 00:00:10 5 NaN
To get the absolute values change this line above:
df['time_diff'] = (df['dates_shift'] - df['dates']) / pd.Timedelta(seconds=1)
To:
df['time_diff'] = (df['dates_shift'] - df['dates']).abs() / pd.Timedelta(seconds=1)
calculate the time difference between two consecutive rows in pandas
Problem is pandas
need datetime
s or timedelta
s for diff
function, so first converting by to_timedelta
, then get total_seconds
and divide by 60
:
df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.966667
1 87556 13:20:33 15.550000
2 88955 13:05:00 49.533333
3 85678 12:15:28 NaN
If want floor
or round
per minutes:
df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
.diff(-1)
.dt.floor('T')
.dt.total_seconds()
.div(60))
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.0
1 87556 13:20:33 15.0
2 88955 13:05:00 49.0
3 85678 12:15:28 NaN
Calculate Time Difference Between Two Pandas Columns in Hours and Minutes
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
to yield,
0 58
1 3
2 8
dtype: float64
How to calculate the time difference in a data frame in python?
Assuming this is your dataset
data = {'date': ['2020/06/24', '2020/06/25', '2020/06/27', '2020/06/30'],
'time': ['23:00:28', '09:10:55', '03:42:58','16:45:51']}
df = pd.DataFrame(data)
print(df)
date time
0 2020/06/24 23:00:28
1 2020/06/25 09:10:55
2 2020/06/27 03:42:58
3 2020/06/30 16:45:51
You can use pandas .diff
after converting your data to proper datetime format using pd.to_datetime
df['date_time'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df['time_diff'] = df['date_time'].diff()
print(df)
date time date_time time_diff
0 2020/06/24 23:00:28 2020-06-24 23:00:28 NaT
1 2020/06/25 09:10:55 2020-06-25 09:10:55 0 days 10:10:27
2 2020/06/27 03:42:58 2020-06-27 03:42:58 1 days 18:32:03
3 2020/06/30 16:45:51 2020-06-30 16:45:51 3 days 13:02:53
Measure the time difference of dataframe in accordance to index
instead of sorting, you can just take max-min for each index group;
# set index
df = df.set_index(df['Index'])
# make sure you have datetime dtype
df['Time'] = pd.to_datetime(df['Time'])
# group by index
grouped = df.groupby(df.index)
# ... and take max-min
ptp = (grouped['Time'].max()-grouped['Time'].min()).dt.total_seconds()/60
ptp
Out[29]:
Index
1 300.0
3 88.0
Name: Time, dtype: float64
note that I have modified the sample data slightly, so that the propagation of Index is visible:
Index Time
1 2020-03-30T13:00:00
1 2020-03-30T14:00:00
1 2020-03-30T15:55:00
1 2020-03-30T18:00:00
3 2020-04-03T09:00:00
3 2020-04-03T09:50:00
3 2020-04-03T10:28:00
Python pandas date difference between rows
from datetime import datetime
import time
t1 = '09/06/2020 13:30:11.359497'
t2 = '10/06/2020 09:30:12.352452'
# convert t1, t2 to type datetime
date_time_t1 = datetime.strptime(t1, '%d/%m/%Y %H:%M:%S.%f')
date_time_t2 = datetime.strptime(t2, '%d/%m/%Y %H:%M:%S.%f')
# convert date_time_t1, date_time_t2 to Unix timestamp
timestamp_1 = time.mktime(date_time_t1.timetuple())
timestamp_2 = time.mktime(date_time_t2.timetuple())
# the difference in minutes
print(int(timestamp_2 - timestamp_1) / 60)
calculate time difference pandas dataframe
Parsing your date was non-trivial, I think strptime could prob do it, but didn't work for me. Your example above your times are just strings, not datetimes.
In [140]: from dateutil import parser
In [130]: def parse(x):
.....: date, hh, mm, ss = x.split(':')
.....: dd, mo, yyyy = date.split('/')
.....: return parser.parse("%s %s %s %s:%s:%s" % (yyyy,mo,dd,hh,mm,ss))
.....:
In [131]: map(parse,idx)
Out[131]:
[datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 16, 23, 56, 42),
datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45)]
In [132]: pd.to_datetime(map(parse,idx))
Out[132]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
Length: 6, Freq: None, Timezone: None
In [133]: df = DataFrame(dict(time = pd.to_datetime(map(parse,idx))))
In [134]: df
Out[134]:
time
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45
In [138]: df['delta'] = (df['time']-df['time'].shift()).fillna(0)
In [139]: df
Out[139]:
time delta
0 2013-05-16 23:56:43 00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43 00:00:01
3 2013-05-17 23:54:45 23:58:02
4 2013-05-17 23:54:45 00:00:00
5 2013-05-17 23:54:45 00:00:00
How to calculate time difference between specific row values in dataframe using python?
Sample:
times = [
'2019-05-18 01:15:28',
'2019-05-18 01:28:11',
'2019-05-18 01:36:36',
'2019-05-18 01:39:47',
'2019-05-18 01:53:32',
'2019-05-18 02:05:37'
]
a = [7, 7, 12, 7, 12, 7]
df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
times A
0 2019-05-18 01:15:28 7
1 2019-05-18 01:28:11 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7
First create default index and filter rows with 7
and 12
only:
df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]
Then get first consecutive values in rows with compare with shifted values:
df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7
Then filter 7
with next 12
rows:
m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)
df2 = df1[m1 | m2]
print (df2)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
Get datetimes with pair and unpairs rows:
out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]
And last subtract:
df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
times A Time_difference
0 2019-05-18 01:15:28 7 00:00:00
1 2019-05-18 01:28:11 7 00:00:00
2 2019-05-18 01:36:36 12 00:21:08
3 2019-05-18 01:39:47 7 00:00:00
4 2019-05-18 01:53:32 12 00:13:45
5 2019-05-18 02:05:37 7 00:00:00
Related Topics
Python Variables as Keys to Dict
How to Add Conda Environment to Jupyter Lab
Calculating Pearson Correlation and Significance in Python
Python "Extend" for a Dictionary
Python Argparse - Add Argument to Multiple Subparsers
Opencv Python Rotate Image by X Degrees Around Specific Point
Can Anyone Explain Python's Relative Imports
Using Subprocess to Run Python Script on Windows
Can One Get Hierarchical Graphs from Networkx with Python 3
Re.Sub Replace with Matched Content
Can Pandas Groupby Aggregate into a List, Rather Than Sum, Mean, etc
Cython: "Fatal Error: Numpy/Arrayobject.H: No Such File or Directory"
Use a Library Locally Instead of Installing It
How to Check If a String Only Contains Letters