A faster strptime?
Python 3.7+: fromisoformat()
Since Python 3.7, the datetime
class has a method fromisoformat
. It should be noted that this can also be applied to this question:
Performance vs. strptime()
Explicit string slicing may give you about a 9x increase in performance compared to normal strptime
, but you can get about a 90x increase with the built-in fromisoformat
method!
%timeit isofmt(datelist)
569 µs ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit slice2int(datelist)
5.51 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit normalstrptime(datelist)
52.1 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 10000
datelist = [(base + timedelta(days=i)).strftime('%Y-%m-%d') for i in range(n)]
def isofmt(l):
return list(map(datetime.fromisoformat, l))
def slice2int(l):
def slicer(t):
return datetime(int(t[:4]), int(t[5:7]), int(t[8:10]))
return list(map(slicer, l))
def normalstrptime(l):
return [datetime.strptime(t, '%Y-%m-%d') for t in l]
print(isofmt(datelist[0:1]))
print(slice2int(datelist[0:1]))
print(normalstrptime(datelist[0:1]))
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
Python 3.8.3rc1 x64 / Win10
Faster datetime.strptime
Let's first assume you have strings in ISO format, '%Y-%m-%dT%H:%M:%S.%f', in a list
(let's also not consider decoding from byte array for now):
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 1000
datelist = [(base + timedelta(days=i)).isoformat(' ') for i in range(n)]
# datelist
# ['2000-01-01 01:02:03.420001'
# ...
# '2002-09-26 01:02:03.420001']
from string to datetime object
Let's define some functions that parse string to datetime
, using different methods:
import re
import numpy as np
def strp_isostr(l):
return list(map(datetime.fromisoformat, l))
def isostr_to_nparr(l):
return np.array(l, dtype=np.datetime64)
def split_isostr(l):
def splitter(s):
tmp = s.split(' ')
tmp = tmp[0].split('-') + [tmp[1]]
tmp = tmp[:3] + tmp[3].split(':')
tmp = tmp[:5] + tmp[5].split('.')
return datetime(*map(int, tmp))
return list(map(splitter, l))
def resplit_isostr(l):
# return list(map(lambda s: datetime(*map(int, re.split('T|-|\:|\.', s))), l))
return [datetime(*map(int, re.split('\ |-|\:|\.', s))) for s in l]
def full_stptime(l):
# return list(map(lambda s: datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f'), l))
return [datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f') for s in l]
If I run %timeit
in the IPython console for these functions on my machine, I get
%timeit strp_isostr(datelist)
98.2 µs ± 766 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit isostr_to_nparr(datelist)
1.49 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit split_isostr(datelist)
3.02 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit resplit_isostr(datelist)
3.8 ms ± 256 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit full_stptime(datelist)
16.7 ms ± 780 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So we can conclude that the built-in datetime.fromisoformat
is by far the fastest option for the 1000-element input. However, this assumes you want a list
to work with. In case you need an np.array
of datetime64
anyway, going straight to that seems like the best option.
third party option: ciso8601
If you're able to install additional packages, ciso8601
is worth a look:
import ciso8601
def ciso(l):
return list(map(ciso8601.parse_datetime, l))
%timeit ciso(datelist)
138 µs ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
from datetime object to seconds since the epoch
Looking at the conversion from datetime
object to POSIX timestamp, using the most obvious datetime.timestamp
method seems to be the most efficient:
import time
def dt_ts(l):
return list(map(datetime.timestamp, l))
def timetup(l):
return list(map(time.mktime, map(datetime.timetuple, l)))
%timeit dt_ts(strp_isostr(datelist))
572 µs ± 4.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit timetup(strp_isostr(datelist))
1.44 ms ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
How to improve the efficiency of time.strptime in Python?
From the answer given here:
A faster strptime?
>>> timeit.timeit("time.strptime(\"2015-02-04 04:05:12\", \"%Y-%m-%d %H:%M:%S\")", setup="import time")
17.206257617290248
>>> timeit.timeit("datetime.datetime(*map(int, \"2015-02-04 04:05:12\".replace(\":\", \"-\").replace(\" \", \"-\").split(\"-\")))", setup="import datetime")
4.687687893159023
Is there a faster alternative to Python's strftime?
Since you have a rigid format you can just access directly the fields of the datetime
object and use Python string formatting to construct the required string:
'{:02d}/{:02d}/{}'.format(now.month, now.day, now.year)
In Python 3 this is about 4 times faster than strftime()
. It's also faster in Python 2, about 2-3 times as fast.
Faster again in Python 3 is the "old" style string interpolation:
'%02d/%02d/%d' % (now.month, now.day, now.year)
about 5 times faster, but I've found this one to be slower for Python 2.
Another option, but only 1.5 times faster, is to use time.strftime()
instead of datetime.strftime()
:
time.strftime('%m/%d/%Y', now.timetuple())
Finally, how are you constructing the datetime
object to begin with? If you are converting strings to datetime
(with strptime()
for example), it might be faster to convert the incoming string version to the outgoing one using string slicing.
Speeding up datetime.strptime
Yes, there are faster methods to parse a date than datetime.strptime()
, if you forgo a lot of flexibility and validation. strptime()
allows both numbers with and without zero-padding, and it only matches strings that use the right separators, whilst your 'ugly' version doesn't.
You should always use the timeit
module for time trials, it is far more accurate than cProfile
here.
Indeed, your 'ugly' approach is twice as fast as strptime()
:
>>> from datetime import date, datetime
>>> import timeit
>>> def ugly(input_date):
... a= 1000 * int(input_date[0])
... b= 100 * int(input_date[1])
... c= 10 * int(input_date[2])
... d= 1 * int(input_date[3])
... year = a+b+c+d
... c= 10 * int(input_date[5])
... d= 1 * int(input_date[6])
... month = c+d
... c= 10 * int(input_date[8])
... d= 1 * int(input_date[9])
... day = c+d
... try:
... my_date = date(year, month, day)
... except ValueError:
... my_date = None
...
>>> def strptime(input_date):
... try:
... my_date = datetime.strptime(input_date, "%Y-%m-%d").date()
... except ValueError:
... my_date = None
...
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import ugly as f')
4.21576189994812
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import strptime as f')
9.873773097991943
Your approach can be improved upon though; you could use slicing:
>>> def slicing(input_date):
... try:
... year = int(input_date[:4])
... month = int(input_date[5:7])
... day = int(input_date[8:])
... my_date = date(year, month, day)
... except ValueError:
... my_date = None
...
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import slicing as f')
1.7224829196929932
Now it is almost 6 times faster. I also moved the int()
calls into the try
- except
to handle invalid input when converting strings to integers.
You could also use str.split()
to get the parts, but that makes it slightly slower again:
>>> def split(input_date):
... try:
... my_date = date(*map(int, input_date.split('-')))
... except ValueError:
... my_date = None
...
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import split as f')
2.294667959213257
Convert list of datestrings to datetime very slow with Python strptime
Indexing / slicing seems to be faster than the regex used by @NPE:
In [47]: def with_indexing(dstr):
....: return datetime.datetime(*map(int, [dstr[:4], dstr[5:7], dstr[8:10],
....: dstr[11:13], dstr[14:16], dstr[17:]]))
In [48]: p = re.compile('[-T:]')
In [49]: def with_regex(dt_str):
....: return datetime.datetime(*map(int, p.split(dt_str)))
In [50]: %timeit with_regex(dstr)
100000 loops, best of 3: 3.84 us per loop
In [51]: %timeit with_indexing(dstr)
100000 loops, best of 3: 2.98 us per loop
I think if you would use a file parser like numpy.genfromtxt
, the converters
argument and a fast string parsing method you can read and parse a whole file in less than a half second.
I used the following function to create an example file with about 25000 rows, ISO date strings as index and 10 data columns:
import numpy as np
import pandas as pd
def create_data():
# create dates
dates = pd.date_range('2010-01-01T00:30', '2013-01-04T23:30', freq='H')
# convert to iso
iso_dates = dates.map(lambda x: x.strftime('%Y-%m-%dT%H:%M:%S'))
# create data
data = pd.DataFrame(np.random.random((iso_dates.size, 10)) * 100,
index=iso_dates)
# write to file
data.to_csv('dates.csv', header=False)
Than I used the following code to parse the file:
In [54]: %timeit a = np.genfromtxt('dates.csv', delimiter=',',
converters={0:with_regex})
1 loops, best of 3: 430 ms per loop
In [55]: %timeit a = np.genfromtxt('dates.csv', delimiter=',',
converters={0:with_indexing})
1 loops, best of 3: 391 ms per loop
pandas (based on numpy) has a C-based file parser which is even faster:
In [56]: %timeit df = pd.read_csv('dates.csv', header=None, index_col=0,
parse_dates=True, date_parser=with_indexing)
10 loops, best of 3: 167 ms per loop
Python datetime.strptime() Eating lots of CPU Time
If those are fixed width formats then there is no need to parse the line - you can use slicing and a dictionary lookup to get the fields directly.
month_abbreviations = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4,
'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8,
'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
year = int(line[7:11])
month = month_abbreviations[line[3:6]]
day = int(line[0:2])
hour = int(line[12:14])
minute = int(line[15:17])
second = int(line[18:20])
new_entry['time'] = datetime.datetime(year, month, day, hour, minute, second)
Testing in the manner shown by Glenn Maynard shows this to be about 3 times faster.
python datetime strptime format
If you get an ISO 8601 string like: "2016-08-15T07:50:12" easiest way I feel is using dateutil to convert it.
import dateutil.parser
yourdate = dateutil.parser.parse(datestring)
Related Topics
Deprecationwarning: Executable_Path Has Been Deprecated Selenium Python
Circular Import Dependency in Python
How to Pass Variables Across Functions
Why Do You Need Explicitly Have the "Self" Argument in a Python Method
Finding Local Maxima/Minima with Numpy in a 1D Numpy Array
Scrape Multiple Urls Using Qwebpage
Convert Pandas Timezone-Aware Datetimeindex to Naive Timestamp, But in Certain Timezone
Pythonic Way to Check If a List Is Sorted or Not
How to Get the Logical Xor of Two Variables in Python
Confused About Backslashes in Regular Expressions
Does "\D" in Regex Mean a Digit
Importerror: No Module Named 'Tkinter'
How to Get Monitor Resolution in Python
How to Convert a Utc Datetime to a Local Datetime Using Only Standard Library
Why Does the Print Function Return None