Convert Excel Style Date with Pandas

Convert Excel style date with pandas

OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:

In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df

Out[85]:
date
0 42580.333333
1 10023.000000

In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df

Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000

OK it seems that excel is a bit weird with it's dates thanks @ayhan:

In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df

Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000

See related: How to convert a python datetime.datetime to excel serial date number

Converting to Excel Date format (within Excel file) using python and pandas from another date format from html table

You need to use pd.ExcelWriter to create a writer object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:

  1. You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
  2. As mentioned in the comments, you must pass format='%d %b %Y' to pd.to_datetime() as that is the Date format you are converting FROM.
  3. You must pass errors='coerce' in order to return NaT for those that don't meet the specified format
  4. For the pd.to_datetime() line of code, you must add .dt.date at the end, because we use a date_format parameter and not a datetime_format parameter in creating the writer object later on. However, you could also exclude dt.date and change the format of the datetime_format parameter.
  5. Then, do table = table.dropna() to drop rows with any columns with NaT
  6. Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use openpyxl and create a writer object and pass the date_format. In case someone says this, you CANNOT simply do: pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y') or .dt.strftime('%d/%m/%y'), because that creates a "General" date format in EXCEL.
  7. Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing d/m/yyyy creates a "Custom" format in Excel.

NOTE: In my code, I have to pass m/d/yyyy in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy' since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.

Source + More on this topic:



import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl

url = "https://www.rba.gov.au/statistics/cash-rate/"

tables = pd.read_html(url)

table = tables[0]

table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')

writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')

# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Sample Image

How to convert a column with Excel Serial Dates and regular dates to a pandas datetime?

  • All the dates can't be parsed in the same manner
  • Load the dataframe
  • Cast the dates column as a str if it's not already.
  • Use Boolean Indexing to select different date types
    • Assuming regular dates contain a /
    • Assuming Excel serial dates do not contain a /
  • Fix each dataframe separately based on its datetime type
  • Concat the dataframes back together.
import pandas as pd
from datetime import datetime

# load data
df = pd.DataFrame({'dates': ['09/01/2020', '05/15/1985', '06/07/2013', '33233', '26299', '29428']})

# display(df)

dates
0 09/01/2020
1 05/15/1985
2 06/07/2013
3 33233
4 26299
5 29428

# set the column type as a str if it isn't already
df.dates = df.dates.astype('str')

# create a date mask based on the string containing a /
date_mask = df.dates.str.contains('/')

# split the dates out for excel
df_excel = df[~date_mask].copy()

# split the regular dates out
df_reg = df[date_mask].copy()

# convert reg dates to datetime
df_reg.dates = pd.to_datetime(df_reg.dates)

# convert excel dates to datetime; the column needs to be cast as ints
df_excel.dates = pd.TimedeltaIndex(df_excel.dates.astype(int), unit='d') + datetime(1900, 1, 1)

# combine the dataframes
df = pd.concat([df_reg, df_excel])

display(df)

       dates
0 2020-09-01
1 1985-05-15
2 2013-06-07
3 1990-12-28
4 1972-01-03
5 1980-07-28

Convert date from excel in number format to date format python

from datetime import datetime
excel_date = 42139
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
tt = dt.timetuple()
print(dt)
print(tt)

As mentioned by J.F. Sebastian, this answer only works for any date after 1900/03/01

EDIT: (in answer to @R.K)

If your excel_date is a float number, use this code:

from datetime import datetime

def floatHourToTime(fh):
hours, hourSeconds = divmod(fh, 1)
minutes, seconds = divmod(hourSeconds * 60, 1)
return (
int(hours),
int(minutes),
int(seconds * 60),
)

excel_date = 42139.23213
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(excel_date) - 2)
hour, minute, second = floatHourToTime(excel_date % 1)
dt = dt.replace(hour=hour, minute=minute, second=second)

print(dt)
assert str(dt) == "2015-05-15 00:13:55"


Related Topics



Leave a reply



Submit