Parsing date/time string with timezone abbreviated name in Python?
That probably won't work because those abbreviations aren't unique. See this page for details. You might wind up just having to manually handle it yourself if you're working with a known set of inputs.
Parse timezone abbreviation to UTC
The Python standard library does not really implement time zones. You should use python-dateutil
. It provides useful extensions to the standard datetime
module including a time zones implementation and a parser.
You can convert time zone aware datetime
objects to UTC with .astimezone(dateutil.tz.tzutc())
. For the current time as a timezone aware datetime object, you can use datetime.datetime.utcnow().replace(tzinfo=dateutil.tz.tzutc())
.
import dateutil.tz
cet = dateutil.tz.gettz('CET')
cesttime = datetime.datetime(2010, 4, 1, 12, 57, tzinfo=cet)
cesttime.isoformat()
'2010-04-01T12:57:00+02:00'
cettime = datetime.datetime(2010, 1, 1, 12, 57, tzinfo=cet)
cettime.isoformat()
'2010-01-01T12:57:00+01:00'
# does not automatically parse the time zone portion
dateutil.parser.parse('Feb 25 2010, 16:19:20 CET')\
.replace(tzinfo=dateutil.tz.gettz('CET'))
Unfortunately this technique will be wrong during the repeated daylight savings time hour.
How to preserve timezone when parsing date/time strings with strptime()?
The datetime
module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to
datetime(*(time.strptime(date_string, format)[0:6]))
.
See that [0:6]
? That gets you (year, month, day, hour, minute, second)
. Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime
doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Parse Date string to datetime with timezone
Using dateutil.parser
you can directly parse your date correctly.
Note that CST
is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos
parameter of the parse()
call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.
timezone_info = {
"CDT": -5 * 3600,
"CEST": 2 * 3600,
"CST": 8 * 3600
}
parser.parse(r, tzinfos=timezone_info)
or (using gettz
) directly specify a timezone:
timezone_info = {
"CDT": gettz("America/Chicago"),
"CEST": gettz("Europe/Berlin"),
"CST": gettz("Asia/Shanghai")
}
parser.parse(r, tzinfos=timezone_info)
See also the dateutil.parser documentation and the answers to this SO question.
Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago")
will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).
Overall:
from dateutil import parser
from dateutil.tz import gettz
timezone_info = {"CST": gettz("Asia/Shanghai")}
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)
print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))
gets you
2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800
EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz()
gets you a tzfile
that has no attribute which has just the name. However, you can obtain it via the protected _filename
using split()
:
print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))
yields
2020-12-17 08:56:41+08:00 in Asia/Shanghai
This of course only works if you used gettz()
to set the timezone in the first place.
EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace()
and specify the timezone either as shown above using gettz()
or using timezone(()
from the pytz
module:
from dateutil import parser
from dateutil.tz import gettz
import pytz
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)
d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))
Note that depending on which module you use to add the timezone information, the class of tzinfo
differs. For the pytz
object, there is a more direct way of accessing the timezone in human readable form:
print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))
print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)
produces
<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai
How to convert string including unrecognized timezone to datetime?
Handling timezone could be a bit tricky. One option could be to make a map of the interested timezone from https://www.timeanddate.com/time/zones/ and then use dateutils.parser
to parse the date.
Something like :
Using UTC Offset:
from dateutil import parser
# Creating a sample map for the timezone abbreviation and the offset
timezone_info = {
"A": 1 * 3600,
"AT": -4 * 3600,
"AWDT": 9 * 3600,
"AWST": 8 * 3600,
"AZOST": 0 * 3600,
"AZOT": -1 * 3600,
"AZST": 5 * 3600,
"AZT": 4 * 3600,
"AoE": -12 * 3600,
"B": 2 * 3600,
"BNT": 8 * 3600,
"BST": 6 * 3600,
"C": 3 * 3600,
"CAST": 8 * 3600,
"CET": 1 * 3600,
"W": -10 * 3600,
"WEST": 1 * 3600,
"WET": 0 * 3600,
"WST": 14 * 3600,
"Z": 0 * 3600,
}
#Date String
date = "Thu Jul 15 12:57:35 AWST 2021"
# parse the timezone info from the date string
tz = date.split(" ")[-2] # Assuming the date format is"%a %b %d %H:%M:%S %Z %Y"
parser.parse(date, tzinfos={tz : timezone_info.get(tz)})
Output:
datetime.datetime(2021, 7, 15, 12, 57, 35, tzinfo=tzoffset('AWST', 28800))
Using IANA Time Zone Names :
(As suggested by @MrFuppes in the comments)
from dateutil import parser
# Creating a sample map for the timezone abbreviation and the offset
timezone_info = {
"AWST": 'Australia/Perth',
"BNT": 'Asia/Brunei',
"CAST": 'Antarctica/Casey',
"CET": 'Europe/Paris'
}
#Date String
date = "Thu Jul 15 12:57:35 AWST 2021"
# parse the timezone info from the date string
tz = date.split(" ")[-2] # Assuming the date format is"%a %b %d %H:%M:%S %Z %Y"
parser.parse(date, tzinfos={tz : timezone_info.get(tz)})
Output:
datetime.datetime(2021, 7, 15, 12, 57, 35, tzinfo=tzstr('Australia/Perth'))
Convert string with timezone included into datetime object
It is a common misconception that %Z
can parse arbitrary abbreviated time zone names. It cannot. See especially the "Notes" section #6 under technical detail in the docs.
You'll have to do that "by hand" since many of those abbreviations are ambiguous. Here's an option how to deal with it using only the standard lib:
from datetime import datetime
from zoneinfo import ZoneInfo
# we need to define which abbreviation corresponds to which time zone
zoneMapping = {'PDT' : ZoneInfo('America/Los_Angeles'),
'PST' : ZoneInfo('America/Los_Angeles'),
'CET' : ZoneInfo('Europe/Berlin'),
'CEST': ZoneInfo('Europe/Berlin')}
# some example inputs; last should fail
timestrings = ('Jun 8, 2021 PDT', 'Feb 8, 2021 PST', 'Feb 8, 2021 CET',
'Aug 9, 2020 WTF')
for t in timestrings:
# we can split off the time zone abbreviation
s, z = t.rsplit(' ', 1)
# parse the first part to datetime object
# and set the time zone; use dict.get if it should be None if not found
dt = datetime.strptime(s, "%b %d, %Y").replace(tzinfo=zoneMapping[z])
print(t, "->", dt)
gives
Jun 8, 2021 PDT -> 2021-06-08 00:00:00-07:00
Feb 8, 2021 PST -> 2021-02-08 00:00:00-08:00
Feb 8, 2021 CET -> 2021-02-08 00:00:00+01:00
Traceback (most recent call last):
dt = datetime.strptime(s, "%b %d, %Y").replace(tzinfo=zoneMapping[z])
KeyError: 'WTF'
Convert non-UTC time string with timezone abbreviation into UTC time in python, while accounting for daylight savings
Using Nas Banov's excellent dictionary mapping timezone abbreviations to UTC offset:
import dateutil
import pytz
# timezone dictionary built here: https://stackoverflow.com/a/4766400/366335
# tzd = {...}
string = 'Jun 20, 4:00PM EDT'
date = dateutil.parser.parse(string, tzinfos=tzd).astimezone(pytz.utc)
Parsing OFX datetime in Python
Some things to note here, first (as commented):
- Python built-in strptime will have a hard time here -
%z
won't parse a single digit offset hour, and%Z
won't parse some (potentially) ambiguous time zone abbreviation.
Then, the OFX Banking Version 2.3 docs (sect. 3.2.8.2 Date and Datetime) leave some questions open to me:
- Is the UTC offset optional ?
- Why is EST called a time zone while it's just an abbreviation ?
- Why in the example the UTC offset is -5 hours while on 1996-10-05, US/Eastern was at UTC-4 ?
- What about offsets that have minutes specified, e.g. +5:30 for Asia/Calcutta ?
- (opinionated) Why re-invent the wheel in the first place instead of using a commonly used standard like ISO 8601 ?
Anyway, here's an attempt at a custom parser:
from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo
def parseOFXdatetime(s, tzinfos=None, _tz=None):
"""
parse OFX datetime string to an aware Python datetime object.
"""
# first, treat formats that have no UTC offset specified.
if not '[' in s:
# just make sure default format is satisfied by filling with zeros if needed
s = s.ljust(14, '0') + '.000' if not '.' in s else s
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=timezone.utc)
# offset and tz are specified, so first get the date/time, offset and tzname components
s, off = s.strip(']').split('[')
off, name = off.split(':')
s = s.ljust(14, '0') + '.000' if not '.' in s else s
# if tzinfos are specified, map the tz name:
if tzinfos:
_tz = tzinfos.get(name) # this might still leave _tz as None...
if not _tz: # ...so we derive a tz from a timedelta
_tz = timezone(timedelta(hours=int(off)), name=name)
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=_tz)
# some test strings
t = ["19961005132200.124[-5:EST]", "19961005132200.124", "199610051322", "19961005",
"199610051322[-5:EST]", "19961005[-5:EST]"]
for s in t:
print(# normal parsing
f'{s}\n {repr(parseOFXdatetime(s))}\n'
# parsing with tzinfo mapping supplied; abbreviation -> timezone object
f' {repr(parseOFXdatetime(s, tzinfos={"EST": ZoneInfo("US/Eastern")}))}\n\n')
Related Topics
Run a Linux System Command as a Superuser, Using a Python Script
Python Requests. 403 Forbidden
How to Make an Immutable Object in Python
How to Construct a Timedelta Object from a Simple String
Python's Equivalent of && (Logical-And) in an If-Statement
Format Output String, Right Alignment
Some Unix Commands Fail with "<Command> Not Found", When Executed Using Python Paramiko Exec_Command
Interprocess Communication in Python
Python Regex Engine - "Look-Behind Requires Fixed-Width Pattern" Error
Compare Two Columns Using Pandas
How to Take a Screenshot/Image of a Website Using Python
Shutting Down Computer (Linux) Using Python
Importerror: Libcblas.So.3: Cannot Open Shared Object File: No Such File or Directory
How to Change the Figure Size of a Seaborn Axes or Figure Level Plot
Typeerror: Not All Arguments Converted During String Formatting Python