Parsing time string in Python
datetime.datetime.strptime
has problems with timezone parsing. Have a look at the dateutil
package:
>>> from dateutil import parser
>>> parser.parse("Tue May 08 15:14:45 +0800 2012")
datetime.datetime(2012, 5, 8, 15, 14, 45, tzinfo=tzoffset(None, 28800))
Parse time from string in UTC format in a way comparable with the current time stamp
I'm not a pro but I made some research and found this solution using astimezone()
:
import dateutil
from datetime import datetime, timezone
start_time = dateutil.parser.isoparse ('2021-01-01T00Z').astimezone(timezone.utc)
end_time = datetime.now (timezone.utc)
timestamps = pd.date_range (start=start_time, end=end_time, freq='H')
timestamps
which gives :
DatetimeIndex(['2021-01-01 00:00:00+00:00', '2021-01-01 01:00:00+00:00',
'2021-01-01 02:00:00+00:00', '2021-01-01 03:00:00+00:00',
'2021-01-01 04:00:00+00:00', '2021-01-01 05:00:00+00:00',
'2021-01-01 06:00:00+00:00', '2021-01-01 07:00:00+00:00',
'2021-01-01 08:00:00+00:00', '2021-01-01 09:00:00+00:00',
...
'2021-05-12 21:00:00+00:00', '2021-05-12 22:00:00+00:00',
'2021-05-12 23:00:00+00:00', '2021-05-13 00:00:00+00:00',
'2021-05-13 01:00:00+00:00', '2021-05-13 02:00:00+00:00',
'2021-05-13 03:00:00+00:00', '2021-05-13 04:00:00+00:00',
'2021-05-13 05:00:00+00:00', '2021-05-13 06:00:00+00:00'],
dtype='datetime64[ns, UTC]', length=3175, freq='H')
Convenient time string parsing in python
You could use a regular expression to extract the number/time unit parts and then look up a multiplier in a dictionary. This way, it is a bit shorter and probably a whole lot more readable than your manual parsing and if/elif
chain.
>>> mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
>>> s = "2d 4h 13m 5.2s"
>>> re.findall(r"(\d+(?:\.\d)?)([smhd])", s)
[('2', 'd'), ('4', 'h'), ('3', 'm'), ('5.2', 's')]
>>> sum(float(x) * mult[m] for x, m in _)
187385.2
As a function:
def duration(string):
mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
parts = re.findall(r"(\d+(?:\.\d)?)([smhd])", string)
total_seconds = sum(float(x) * mult[m] for x, m in parts)
return timedelta(seconds=total_seconds)
print(duration("2d 4h 13m 5.2s"))
# 2 days, 4:03:05.200000
This will also ensure that the number part is actually a valid number (and not just any sequence of digits and dots). Also, it will raise an exception if any other than the allowed time units are used.
The function could be further optimized by pre-compiling the regex with re.compile
outside of the function. When I tested it with IPython's %timeit
, mine showed to be a bit faster (2.1µs vs. 2.8µs for yours, both without the timedelta
creation and with just float
instead of Decimal
). Also, I would consider this to be more readable by having a much more declarative and less imperative style, but that's certainly a matter of taste and preferences.
Remove noise(hours) for parsing time in Y/M/D format
If you just need year/month/day columns, there's actually no need to parse to datetime. Just deal with the strings by splitting and rearranging; EX:
import pandas as pd
df = pd.DataFrame({'Startdate': ['December 1, 2021 6:00', 'March 23, 2022 6']})
parts = df['Startdate'].str.split('\ |, ')
df['year'], df['month'], df['day'] = parts.str[2], parts.str[0], parts.str[1]
print(df)
# Startdate year month day
# 0 December 1, 2021 6:00 2021 December 1
# 1 March 23, 2022 6 2022 March 23
How to use parser on multiple time objects
You only get one value back from your to_date
function because you exit the function in the first loop iteration. You need to introduce an list storing your parsed dates temporary:
from dateutil import parser
def to_date(date_list):
parsed_date_list = []
for date in date_list:
parsed_date_list.append(parser.parse(date))
return parsed_date_list
date_list = ['2022-06-01', '2022-02-02']
res = to_date(date_list)
Or using a list comprehension to keep your code more concise:
from dateutil import parser
def to_date(date_list):
return [parser.parse(date) for date in date_list]
date_list = ['2022-06-01', '2022-02-02']
res = to_date(date_list)
And to format your string, simply use the strftime
function as pointed out by kpie
in his comment:
# res = to_date(date_list)
date_format = "%b %d, %Y"
print(f"From: {res[0].strftime(date_format)} | To: {res[1].strftime(date_format)}")
Do not use list
as a variable name. list
is a data structure and therefore already in use by the class list
.
Related Topics
How to Take the Nth Digit of a Number in Python
How to Import Data from Mongodb to Pandas
Using Multiple Python Engines (32Bit/64Bit and 2.7/3.5)
Using Print() (The Function Version) in Python2.X
Schedule a Repeating Event in Python 3
How to Use a Conditional Expression (Expression with If and Else) in a List Comprehension
How to Concatenate Two Layers in Keras
How to Remove the Space Between Subplots in Matplotlib.Pyplot
Reading Dynamically Generated Web Pages Using Python
Finding the Consecutive Zeros in a Numpy Array
Python Opencv Line Detection to Detect 'X' Symbol in Image
Print Current Call Stack from a Method in Code
Unbuffered Stdout in Python (As in Python -U) from Within the Program
Datetime Timezone Conversion Using Pytz
Python: Changing Methods and Attributes at Runtime