Parse Date string to datetime with timezone
Using dateutil.parser
you can directly parse your date correctly.
Note that CST
is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos
parameter of the parse()
call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.
timezone_info = {
"CDT": -5 * 3600,
"CEST": 2 * 3600,
"CST": 8 * 3600
}
parser.parse(r, tzinfos=timezone_info)
or (using gettz
) directly specify a timezone:
timezone_info = {
"CDT": gettz("America/Chicago"),
"CEST": gettz("Europe/Berlin"),
"CST": gettz("Asia/Shanghai")
}
parser.parse(r, tzinfos=timezone_info)
See also the dateutil.parser documentation and the answers to this SO question.
Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago")
will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).
Overall:
from dateutil import parser
from dateutil.tz import gettz
timezone_info = {"CST": gettz("Asia/Shanghai")}
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)
print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))
gets you
2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800
EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz()
gets you a tzfile
that has no attribute which has just the name. However, you can obtain it via the protected _filename
using split()
:
print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))
yields
2020-12-17 08:56:41+08:00 in Asia/Shanghai
This of course only works if you used gettz()
to set the timezone in the first place.
EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace()
and specify the timezone either as shown above using gettz()
or using timezone(()
from the pytz
module:
from dateutil import parser
from dateutil.tz import gettz
import pytz
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)
d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))
Note that depending on which module you use to add the timezone information, the class of tzinfo
differs. For the pytz
object, there is a more direct way of accessing the timezone in human readable form:
print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))
print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)
produces
<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai
Parse a date in a specific timezone with Python
There are three steps:
Convert the date string into a naive
datetime
object:from datetime import datetime
dt = datetime(*map(int ,'2015-01-01'.split('-')))Get a timezone-aware datetime object:
import pytz # $ pip install pytz
aware = pytz.timezone("US/Mountain").localize(dt, is_dst=None)is_dst=None
raises an exception for ambiguous or non-existing times. Here're more details about what isis_dst
flag and why do you need it, see "Can I just always set is_dst=True?" sectionGet POSIX timestamp:
timestamp = aware.timestamp()
.timestamp()
is available since Python 3.3+. See multiple solutions for older Python versions.
Extract UTC date from timezone aware string
Assuming the format is consistent in your data (length of the strings is constant), you can do a bit of string slicing to separate date/time and UTC offset. Parse the first to datetime
and add the latter as a timezone
constructed from a timedelta
. Then convert to UTC.
Ex:
from datetime import datetime, timedelta, timezone
s = '2021-04-15T21:53:00:000-06'
# first part to datetime
dt = datetime.fromisoformat(s[:-3])
# set time zone
dt = dt.replace(tzinfo=timezone(timedelta(hours=int(s[-3:]))))
# to UTC
dt_utc = dt.astimezone(timezone.utc)
print(dt_utc.date())
# 2021-04-16
Note that this will fail if the format is not consistent, e.g. if some strings have +0530
while others only have e.g. -06
.
In that case, another option is to use strptime
, but that requires modifying the input as well. %z
expects ±HH:MM
or ±HHMM
, so you can add the minutes like
if len(s) == 26: # minutes missing
s += '00'
dt = datetime.strptime(s, "%Y-%m-%dT%H:%M:%S:%f%z")
and then convert to UTC as described above.
Parse Datetime with +0 timezone
Any ideas what I have overseen?
strftime.org claims that %z
UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object
is naive).
this mean that it must contain at least 4 digits after +
or -
(HHMM
part, which is compulsory), taking this is account Dec 03 2020 01: +0
is not compliant with used format string, whilst Dec 03 2020 01: +0000
is
import datetime
dtObj = datetime.datetime.strptime("Dec 03 2020 01: +0000", '%b %d %Y %I: %z')
print(dtObj)
gives output
2020-12-03 01:00:00+00:00
How to convert date string with timezone to datetime in python
You could use %z
to parse timezone info:
>>> from datetime import datetime, timezone
>>> datetime.strptime(str, "%Y-%m-%dT%H:%M:%S%z")
datetime.datetime(2015, 8, 23, 3, 36, 30, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400)))
Then, if you want to convert this datetime to UTC (which I assume is your goal since you say you want to compare datetimes), you could use astimezone
method:
>>> datetime.strptime(str, "%Y-%m-%dT%H:%M:%S%z").astimezone(timezone.utc)
datetime.datetime(2015, 8, 23, 8, 36, 30, tzinfo=datetime.timezone.utc)
Back in string format:
>>> datetime.strptime(str, "%Y-%m-%dT%H:%M:%S%z").astimezone(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
'2015-08-23 08:36:30'
Datetime Time Zone Scraping Python
Note %z in strptime() is for timezone offsets not names and %Z only accepts certain values for time zones. For details see API docs.
Simplest option is to use dateparser module to parse dates with time zone names (e.g. EDT).
import dateparser
s = "Jun 1, 2022 2:49PM EDT"
d = dateparser.parse(s)
print(d)
Output:
2022-06-01 14:49:00-04:00
Many of the date modules (e.g. dateutil and pytz) have timezone offsets defined for "EST", "PST", etc. but "EDT" is less common. These modules would need you to define the timezone with the offset as UTC-04:00.
import dateutil.parser
s = "Jun 1, 2022 2:49PM EDT"
tzinfos = {"EDT": -14400}
d = dateutil.parser.parse(s, tzinfos=tzinfos)
print(d)
Output:
2022-06-01 14:49:00-04:00
Parsing date/time string with timezone abbreviated name in Python?
That probably won't work because those abbreviations aren't unique. See this page for details. You might wind up just having to manually handle it yourself if you're working with a known set of inputs.
How can I parse a custom string as a timezone aware datetime?
Hmm How about maybe:
import re
import datetime
foo = "18 January 2022, 14:50 GMT-5"
bar = re.sub(r"[+-]\d+$", lambda m: "{:05d}".format(100 * int(m.group())), foo)
print(datetime.datetime.strptime(bar, "%d %B %Y, %H:%M %Z%z" ))
I think that gives you:
2022-01-18 14:50:00-05:00
Convert string with timezone included into datetime object
It is a common misconception that %Z
can parse arbitrary abbreviated time zone names. It cannot. See especially the "Notes" section #6 under technical detail in the docs.
You'll have to do that "by hand" since many of those abbreviations are ambiguous. Here's an option how to deal with it using only the standard lib:
from datetime import datetime
from zoneinfo import ZoneInfo
# we need to define which abbreviation corresponds to which time zone
zoneMapping = {'PDT' : ZoneInfo('America/Los_Angeles'),
'PST' : ZoneInfo('America/Los_Angeles'),
'CET' : ZoneInfo('Europe/Berlin'),
'CEST': ZoneInfo('Europe/Berlin')}
# some example inputs; last should fail
timestrings = ('Jun 8, 2021 PDT', 'Feb 8, 2021 PST', 'Feb 8, 2021 CET',
'Aug 9, 2020 WTF')
for t in timestrings:
# we can split off the time zone abbreviation
s, z = t.rsplit(' ', 1)
# parse the first part to datetime object
# and set the time zone; use dict.get if it should be None if not found
dt = datetime.strptime(s, "%b %d, %Y").replace(tzinfo=zoneMapping[z])
print(t, "->", dt)
gives
Jun 8, 2021 PDT -> 2021-06-08 00:00:00-07:00
Feb 8, 2021 PST -> 2021-02-08 00:00:00-08:00
Feb 8, 2021 CET -> 2021-02-08 00:00:00+01:00
Traceback (most recent call last):
dt = datetime.strptime(s, "%b %d, %Y").replace(tzinfo=zoneMapping[z])
KeyError: 'WTF'
Related Topics
Python Threading with Queue: How to Avoid to Use Join
"Ssl Module in Python Is Not Available" When Installing Package with Pip3
Detect Specific Keypresses in Gui
Usb Automatic Detection in Python for Linux Env
Does Python Do Variable Interpolation Similar to "String #{Var}" in Ruby
How to Open (Read-Write) or Create a File with Truncation Allowed
Why Use Python's Os Module Methods Instead of Executing Shell Commands Directly
Fastest Way to Download 3 Million Objects from a S3 Bucket
No Module Named 'Virtualenvwrapper'
Make (Install from Source) Python Without Running Tests
How to Add File Extensions Based on File Type on Linux/Unix
Integer Division in Python 2 and Python 3
Getting an "Invalid Syntax" When Trying to Perform String Interpolation
Add Custom Method to String Object
How to Run Multiple Python Versions on Windows