Parsing Time String in Python

Parsing time string in Python

datetime.datetime.strptime has problems with timezone parsing. Have a look at the dateutil package:

>>> from dateutil import parser
>>> parser.parse("Tue May 08 15:14:45 +0800 2012")
datetime.datetime(2012, 5, 8, 15, 14, 45, tzinfo=tzoffset(None, 28800))

Parse time from string in UTC format in a way comparable with the current time stamp

I'm not a pro but I made some research and found this solution using astimezone():

import dateutil
from datetime import datetime, timezone

start_time = dateutil.parser.isoparse ('2021-01-01T00Z').astimezone(timezone.utc)
end_time = datetime.now (timezone.utc)
timestamps = pd.date_range (start=start_time, end=end_time, freq='H')
timestamps

which gives :

DatetimeIndex(['2021-01-01 00:00:00+00:00', '2021-01-01 01:00:00+00:00',
'2021-01-01 02:00:00+00:00', '2021-01-01 03:00:00+00:00',
'2021-01-01 04:00:00+00:00', '2021-01-01 05:00:00+00:00',
'2021-01-01 06:00:00+00:00', '2021-01-01 07:00:00+00:00',
'2021-01-01 08:00:00+00:00', '2021-01-01 09:00:00+00:00',
...
'2021-05-12 21:00:00+00:00', '2021-05-12 22:00:00+00:00',
'2021-05-12 23:00:00+00:00', '2021-05-13 00:00:00+00:00',
'2021-05-13 01:00:00+00:00', '2021-05-13 02:00:00+00:00',
'2021-05-13 03:00:00+00:00', '2021-05-13 04:00:00+00:00',
'2021-05-13 05:00:00+00:00', '2021-05-13 06:00:00+00:00'],
dtype='datetime64[ns, UTC]', length=3175, freq='H')

Convenient time string parsing in python

You could use a regular expression to extract the number/time unit parts and then look up a multiplier in a dictionary. This way, it is a bit shorter and probably a whole lot more readable than your manual parsing and if/elif chain.

>>> mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
>>> s = "2d 4h 13m 5.2s"
>>> re.findall(r"(\d+(?:\.\d)?)([smhd])", s)
[('2', 'd'), ('4', 'h'), ('3', 'm'), ('5.2', 's')]
>>> sum(float(x) * mult[m] for x, m in _)
187385.2

As a function:

def duration(string):
mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
parts = re.findall(r"(\d+(?:\.\d)?)([smhd])", string)
total_seconds = sum(float(x) * mult[m] for x, m in parts)
return timedelta(seconds=total_seconds)

print(duration("2d 4h 13m 5.2s"))
# 2 days, 4:03:05.200000

This will also ensure that the number part is actually a valid number (and not just any sequence of digits and dots). Also, it will raise an exception if any other than the allowed time units are used.

The function could be further optimized by pre-compiling the regex with re.compile outside of the function. When I tested it with IPython's %timeit, mine showed to be a bit faster (2.1µs vs. 2.8µs for yours, both without the timedelta creation and with just float instead of Decimal). Also, I would consider this to be more readable by having a much more declarative and less imperative style, but that's certainly a matter of taste and preferences.

Remove noise(hours) for parsing time in Y/M/D format

If you just need year/month/day columns, there's actually no need to parse to datetime. Just deal with the strings by splitting and rearranging; EX:

import pandas as pd

df = pd.DataFrame({'Startdate': ['December 1, 2021 6:00', 'March 23, 2022 6']})

parts = df['Startdate'].str.split('\ |, ')

df['year'], df['month'], df['day'] = parts.str[2], parts.str[0], parts.str[1]

print(df)
# Startdate year month day
# 0 December 1, 2021 6:00 2021 December 1
# 1 March 23, 2022 6 2022 March 23

How to use parser on multiple time objects

You only get one value back from your to_date function because you exit the function in the first loop iteration. You need to introduce an list storing your parsed dates temporary:

from dateutil import parser

def to_date(date_list):
parsed_date_list = []
for date in date_list:
parsed_date_list.append(parser.parse(date))
return parsed_date_list

date_list = ['2022-06-01', '2022-02-02']
res = to_date(date_list)

Or using a list comprehension to keep your code more concise:

from dateutil import parser

def to_date(date_list):
return [parser.parse(date) for date in date_list]

date_list = ['2022-06-01', '2022-02-02']
res = to_date(date_list)

And to format your string, simply use the strftime function as pointed out by kpie
in his comment:

# res = to_date(date_list)

date_format = "%b %d, %Y"
print(f"From: {res[0].strftime(date_format)} | To: {res[1].strftime(date_format)}")

Do not use list as a variable name. list is a data structure and therefore already in use by the class list.



Related Topics



Leave a reply



Submit