datetime to string with series in pandas
There is no .str
accessor for datetimes and you can't do .astype(str)
either.
Instead, use .dt.strftime
:
>>> series = pd.Series(['20010101', '20010331'])
>>> dates = pd.to_datetime(series, format='%Y%m%d')
>>> dates.dt.strftime('%Y-%m-%d')
0 2001-01-01
1 2001-03-31
dtype: object
See the docs on customizing date string formats here: strftime() and strptime() Behavior.
For old pandas versions <0.17.0
, one can instead can call .apply
with the Python standard library's datetime.strftime
:
>>> dates.apply(lambda x: x.strftime('%Y-%m-%d'))
0 2001-01-01
1 2001-03-31
dtype: object
Converting a datetime column to a string column
If you're using version 0.17.0
or higher then you can call this using .dt.strftime
which is vectorised:
all_data['Order Day new'] = all_data['Order Day new'].dt.strftime('%Y-%m-%d')
** If your pandas version is older than 0.17.0
then you have to call apply
and pass the data to strftime
:
In [111]:
all_data = pd.DataFrame({'Order Day new':[dt.datetime(2014,5,9), dt.datetime(2012,6,19)]})
print(all_data)
all_data.info()
Order Day new
0 2014-05-09
1 2012-06-19
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
Order Day new 2 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 32.0 bytes
In [108]:
all_data['Order Day new'] = all_data['Order Day new'].apply(lambda x: dt.datetime.strftime(x, '%Y-%m-%d'))
all_data
Out[108]:
Order Day new
0 2014-05-09
1 2012-06-19
In [109]:
all_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
Order Day new 2 non-null object
dtypes: object(1)
memory usage: 32.0+ bytes
You can't call strftime
on the column as it doesn't understand Series
as a param hence the error
pandas date to string
In order to extract the value, you can try:
ss = series_of_dates.astype(str).tail(1).reset_index().loc[0, 'date']
using loc
will give you the contents just fine.
pandas timestamp series to string?
Consider the dataframe df
df = pd.DataFrame(dict(timestamp=pd.to_datetime(['2000-01-01'])))
df
timestamp
0 2000-01-01
Use the datetime accessor dt
to access the strftime
method. You can pass a format string to strftime
and it will return a formatted string. When used with the dt
accessor you will get a series of strings.
df.timestamp.dt.strftime('%Y-%m-%d')
0 2000-01-01
Name: timestamp, dtype: object
Visit strftime.org
for a handy set of format strings.
Get datetime format from string python
In pandas
, this is achieved by pandas._libs.tslibs.parsing.guess_datetime_format
from pandas._libs.tslibs.parsing import guess_datetime_format
guess_datetime_format('2021-01-01')
# '%Y-%m-%d'
As there will always be an ambiguity on the day/month, you can specify the dayfirst case:
guess_datetime_format('2021-01-01', dayfirst=True)
# '%Y-%d-%m'
Faster conversion of Pandas datetime colum to string
If it is correct to assume that a large number of records will have the same date (which seems likely for a dataset with 10M records), we can leverage that and improve efficiency by not converting the same date to string over and over.
For example, here's how it would look like on per-second data from 2021-01-01 to 2021-02-01 (which is about 2.7M records):
df = pd.DataFrame({'dt': pd.date_range('2021-01-01', '2021-02-01', freq='1s')})
Here's with the strftime
applied to the whole column:
%%time
df['dt_str'] = df['dt'].dt.strftime('%d.%m.%Y')
Output:
CPU times: user 8.07 s, sys: 63.9 ms, total: 8.14 s
Wall time: 8.14 s
And here's with map
applied to de-duplicated values:
%%time
dts = df['dt'].astype('datetime64[D]').drop_duplicates()
m = pd.Series(dts.dt.strftime('%d.%m.%Y'), dts)
df['dt_str'] = df['dt'].map(m)
Output:
CPU times: user 207 ms, sys: 32 ms, total: 239 ms
Wall time: 240 ms
It's about 30x faster. Of course, the speedup depends on the number of unique date values -- the higher the number, the less we gain by using this method.
Related Topics
Modifying a List Inside a Function
Suppress the U'Prefix Indicating Unicode' in Python Strings
Python Selenium Webdriver. Writing My Own Expected Condition
Python CSV Error: Line Contains Null Byte
Pandas Select from Dataframe Using Startswith
Run a .Bat File Using Python Code
How to Print to Stderr in Python
Traverse a List in Reverse Order in Python
How to Check If Type of a Variable Is String
How to Get First Element in a List of Tuples
Python Sharing a Lock Between Processes
Cmd Opens Windows Store When I Type 'Python'
Differencebetween Drawing Plots Using Plot, Axes or Figure in Matplotlib
Thread Starts Running Before Calling Thread.Start
Python' Is Not Recognized as an Internal or External Command