Convert Excel style date with pandas
OK I think the easiest thing is to construct a TimedeltaIndex
from the floats and add this to the scalar datetime for 1900,1,1
:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks @ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
Converting to Excel Date format (within Excel file) using python and pandas from another date format from html table
You need to use pd.ExcelWriter
to create a writer
object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:
- You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
- As mentioned in the comments, you must pass
format='%d %b %Y'
topd.to_datetime()
as that is the Date format you are converting FROM. - You must pass
errors='coerce'
in order to returnNaT
for those that don't meet the specified format - For the
pd.to_datetime()
line of code, you must add.dt.date
at the end, because we use adate_format
parameter and not adatetime_format
parameter in creating thewriter
object later on. However, you could also excludedt.date
and change the format of thedatetime_format
parameter. - Then, do
table = table.dropna()
to drop rows with any columns withNaT
- Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use
openpyxl
and create awriter
object and pass thedate_format
. In case someone says this, you CANNOT simply do:pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y')
or.dt.strftime('%d/%m/%y')
, because that creates a "General" date format in EXCEL. - Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing
d/m/yyyy
creates a "Custom" format in Excel.
NOTE: In my code, I have to pass m/d/yyyy
in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy'
since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.
Source + More on this topic:
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')
writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
How to convert a column with Excel Serial Dates and regular dates to a pandas datetime?
- All the dates can't be parsed in the same manner
- Load the dataframe
- Cast the
dates
column as astr
if it's not already. - Use Boolean Indexing to select different date types
- Assuming regular dates contain a
/
- Assuming Excel serial dates do not contain a
/
- Assuming regular dates contain a
- Fix each dataframe separately based on its datetime type
- Concat the dataframes back together.
import pandas as pd
from datetime import datetime
# load data
df = pd.DataFrame({'dates': ['09/01/2020', '05/15/1985', '06/07/2013', '33233', '26299', '29428']})
# display(df)
dates
0 09/01/2020
1 05/15/1985
2 06/07/2013
3 33233
4 26299
5 29428
# set the column type as a str if it isn't already
df.dates = df.dates.astype('str')
# create a date mask based on the string containing a /
date_mask = df.dates.str.contains('/')
# split the dates out for excel
df_excel = df[~date_mask].copy()
# split the regular dates out
df_reg = df[date_mask].copy()
# convert reg dates to datetime
df_reg.dates = pd.to_datetime(df_reg.dates)
# convert excel dates to datetime; the column needs to be cast as ints
df_excel.dates = pd.TimedeltaIndex(df_excel.dates.astype(int), unit='d') + datetime(1900, 1, 1)
# combine the dataframes
df = pd.concat([df_reg, df_excel])
display(df)
dates
0 2020-09-01
1 1985-05-15
2 2013-06-07
3 1990-12-28
4 1972-01-03
5 1980-07-28
Convert date from excel in number format to date format python
from datetime import datetime
excel_date = 42139
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
tt = dt.timetuple()
print(dt)
print(tt)
As mentioned by J.F. Sebastian, this answer only works for any date after 1900/03/01
EDIT: (in answer to @R.K)
If your excel_date
is a float number, use this code:
from datetime import datetime
def floatHourToTime(fh):
hours, hourSeconds = divmod(fh, 1)
minutes, seconds = divmod(hourSeconds * 60, 1)
return (
int(hours),
int(minutes),
int(seconds * 60),
)
excel_date = 42139.23213
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(excel_date) - 2)
hour, minute, second = floatHourToTime(excel_date % 1)
dt = dt.replace(hour=hour, minute=minute, second=second)
print(dt)
assert str(dt) == "2015-05-15 00:13:55"
Related Topics
Efficiently Using Multiple Numpy Slices for Random Image Cropping
Python Analog of PHP's Natsort Function (Sort a List Using a "Natural Order" Algorithm)
Comparing Two Lists Using the Greater Than or Less Than Operator
How to Create a Custom String Representation for a Class Object
What's the Bad Magic Number Error
Convert Array of Indices to One-Hot Encoded Array in Numpy
Finding All Possible Permutations of a Given String in Python
Plotting with Seaborn Using the Matplotlib Object-Oriented Interface
How to Upgrade All Python Packages with Pip
Nested Arguments Not Compiling
Different Behaviour for List._Iadd_ and List._Add_
Using Python Requests with JavaScript Pages
How to Open a File Using the Open with Statement
How to Detach Matplotlib Plots So That the Computation Can Continue
Dictionary Creation with Fromkeys and Mutable Objects. a Surprise