How to Read a Date in Excel Format in Python

How do I read a date in Excel format in Python?

You can use xlrd.

From its documentation, you can read that dates are always stored as numbers; however, you can use xldate_as_tuple to convert it to a python date.

Note: the version on the PyPI seems more up-to-date than the one available on xlrd's website.

Read date from excel as string Python

I don't think you were supposed to cast the date as a string. Note I made a couple of other changes:

  • used for loops instead of while loops
  • please don't use single letter var names (like i, and j)
  • xldate_as_datetime makes more sense than xldate_as_tuple

note if you want the date displayed in a specific format try strftime

import xlrd

workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_name('Sheet1')

num_rows = worksheet.nrows
num_cols = worksheet.ncols
values = []

for row in range(1, num_rows):
row_values = []
for col in range(num_cols):
if col == 3:
datetime_ = xldate.xldate_as_datetime(worksheet.cell_value(row, col), workbook.datemode)
row_values.append(datetime_)
else:
row_values.append(worksheet.cell_value(row, col))
values.append(row_values)

Read date from Excel file and get a delta with today's date

You can use pd.Timestamp.today and then extract the amount of days from the timedelta which it returns with dt.days:

# Example dataframe
df = pd.DataFrame({'Date':['20-6-2019', '21-6-2019', '25-6-2019']})
df['Date'] = pd.to_datetime(df['Date'])

Date
0 2019-06-20
1 2019-06-21
2 2019-06-25

(df['Date'] - pd.Timestamp.today()).dt.days

0 3
1 4
2 8
Name: Date, dtype: int64

Note in this case 3 days for 20-6-2019 is correct, since there are 3 full days between date now and the 20th of june.

Converting to Excel "Date" format (within Excel file) using python and pandas from another date format from html table

You need to use pd.ExcelWriter to create a writer object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:

  1. You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
  2. As mentioned in the comments, you must pass format='%d %b %Y' to pd.to_datetime() as that is the Date format you are converting FROM.
  3. You must pass errors='coerce' in order to return NaT for those that don't meet the specified format
  4. For the pd.to_datetime() line of code, you must add .dt.date at the end, because we use a date_format parameter and not a datetime_format parameter in creating the writer object later on. However, you could also exclude dt.date and change the format of the datetime_format parameter.
  5. Then, do table = table.dropna() to drop rows with any columns with NaT
  6. Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use openpyxl and create a writer object and pass the date_format. In case someone says this, you CANNOT simply do: pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y') or .dt.strftime('%d/%m/%y'), because that creates a "General" date format in EXCEL.
  7. Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing d/m/yyyy creates a "Custom" format in Excel.

NOTE: In my code, I have to pass m/d/yyyy in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy' since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.

Source + More on this topic:



import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl

url = "https://www.rba.gov.au/statistics/cash-rate/"

tables = pd.read_html(url)

table = tables[0]

table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')

writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')

# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

enter image description here

Properly parsing excel format for dates

Unfortunately, stovfl's solution was not actually generalized to all xlsx formats. After much searching through Microsoft documentation I was finally able to find this page that documents some of the rules of excel number_format.

Important things to note:

  • mm and m refer to minutes ONLY if the most immediate preceding code is a hh or h (hours) or if the most immediate following code is ss or s (seconds), otherwise mm and m refer to months.
  • Most characters that are not codes must be preceded with a backslash
  • Characters surrounded by quotation marks are interpreted literally (not as codes)
  • There's a big list of characters that display without any escaping or quotation marks
  • There is the concept of sections, separated by a semicolon. For the purpose of this solution I chose to ignore sections (because if the sections are actually used the resulting output will not look like a date).
  • Certain codes in excel have no equivalent code in strftime. For example, mmmmm displays the first letter of the month. For my solution I chose to replace these codes with similar strftime codes (for mmmmm I chose %b, which displays an abbreviation of the month). I noted these codes in comments

Anyways, I just built a function that given an excel number_format date string, returns the python strftime equivalent. I hope this can help someone looking for a way to get "What you see is what you get" from excel to python.

EXCEL_CODES = {
'yyyy': '%Y',
'yy': '%y',
'dddd': '%A',
'ddd': '%a',
'dd': '%d',
'd': '%-d',
# Different from excel as there is no J-D in strftime
'mmmmmm': '%b',
'mmmm': '%B',
'mmm': '%b',
'hh': '%H',
'h': '%-H',
'ss': '%S',
's': '%-S',
# Possibly different from excel as there is no am/pm in strftime
'am/pm': '%p',
# Different from excel as there is no A/P or a/p in strftime
'a/p': '%p',
}

EXCEL_MINUTE_CODES = {
'mm': '%M',
'm': '%-M',
}
EXCEL_MONTH_CODES = {
'mm': '%m',
'm': '%-m',
}

EXCEL_MISC_CHARS = [
'$',
'+',
'(',
':',
'^',
'\'',
'{',
'<',
'=',
'-',
'/',
')',
'!',
'&',
'~',
'}',
'>',
' ',
]

EXCEL_ESCAPE_CHAR = '\\'
EXCEL_SECTION_DIVIDER = ';'

def convert_excel_date_format_string(excel_date):
'''
Created using documentation here:
https://support.office.com/en-us/article/review-guidelines-for-customizing-a-number-format-c0a1d1fa-d3f4-4018-96b7-9c9354dd99f5

'''
# The python date string that is being built
python_date = ''
# The excel code currently being parsed
excel_code = ''
prev_code = ''
# If the previous character was the escape character
char_escaped = False
# If we are in a quotation block (surrounded by "")
quotation_block = False
# Variables used for checking if a code should be a minute or a month
checking_minute_or_month = False
minute_or_month_buffer = ''

for c in excel_date:
ec = excel_code.lower()
# The previous character was an escape, the next character should be added normally
if char_escaped:
if checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
char_escaped = False
continue
# Inside a quotation block
if quotation_block:
if c == '"':
# Quotation block should now end
quotation_block = False
elif checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
continue
# The start of a quotation block
if c == '"':
quotation_block = True
continue
if c == EXCEL_SECTION_DIVIDER:
# We ignore excel sections for datetimes
break

is_escape_char = c == EXCEL_ESCAPE_CHAR
# The am/pm and a/p code add some complications, need to make sure we are not that code
is_misc_char = c in EXCEL_MISC_CHARS and (c != '/' or (ec != 'am' and ec != 'a'))
# Code is finished, check if it is a proper code
if (is_escape_char or is_misc_char) and ec:
# Checking if the previous code should have been minute or month
if checking_minute_or_month:
if ec == 'ss' or ec == 's':
# It should be a minute!
minute_or_month_buffer = EXCEL_MINUTE_CODES[prev_code] + minute_or_month_buffer
else:
# It should be a months!
minute_or_month_buffer = EXCEL_MONTH_CODES[prev_code] + minute_or_month_buffer
python_date += minute_or_month_buffer
checking_minute_or_month = False
minute_or_month_buffer = ''

if ec in EXCEL_CODES:
python_date += EXCEL_CODES[ec]
# Handle months/minutes differently
elif ec in EXCEL_MINUTE_CODES:
# If preceded by hours, we know this is referring to minutes
if prev_code == 'h' or prev_code == 'hh':
python_date += EXCEL_MINUTE_CODES[ec]
else:
# Have to check if the next code is ss or s
checking_minute_or_month = True
minute_or_month_buffer = ''
else:
# Have to abandon this attempt to convert because the code is not recognized
return None
prev_code = ec
excel_code = ''
if is_escape_char:
char_escaped = True
elif is_misc_char:
# Add the misc char
if checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
else:
# Just add to the code
excel_code += c

# Complete, check if there is still a buffer
if checking_minute_or_month:
# We know it's a month because there were no more codes after
minute_or_month_buffer = EXCEL_MONTH_CODES[prev_code] + minute_or_month_buffer
python_date += minute_or_month_buffer
if excel_code:
ec = excel_code.lower()
if ec in EXCEL_CODES:
python_date += EXCEL_CODES[ec]
elif ec in EXCEL_MINUTE_CODES:
if prev_code == 'h' or prev_code == 'hh':
python_date += EXCEL_MINUTE_CODES[ec]
else:
python_date += EXCEL_MONTH_CODES[ec]
else:
return None
return python_date

Tested with python 3.6.7 using openpyxl 3.0.1

How to convert date format when reading from Excel - Python

import datetime
df = pd.DataFrame({'data': ["11/14/2015 00:00:00", "11/14/2015 00:10:00", "11/14/2015 00:20:00"]})
df["data"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'))

EDIT

If you'd like to work with df.columns you could use map function:

df.columns = list(map(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'), df1.columns))

You need list if you are using python 3.x because it's iterator by default.

Convert date from excel in number format to date format python

from datetime import datetime
excel_date = 42139
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
tt = dt.timetuple()
print(dt)
print(tt)

As mentioned by J.F. Sebastian, this answer only works for any date after 1900/03/01

EDIT: (in answer to @R.K)

If your excel_date is a float number, use this code:

from datetime import datetime

def floatHourToTime(fh):
hours, hourSeconds = divmod(fh, 1)
minutes, seconds = divmod(hourSeconds * 60, 1)
return (
int(hours),
int(minutes),
int(seconds * 60),
)

excel_date = 42139.23213
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(excel_date) - 2)
hour, minute, second = floatHourToTime(excel_date % 1)
dt = dt.replace(hour=hour, minute=minute, second=second)

print(dt)
assert str(dt) == "2015-05-15 00:13:55"

how to read date in a specific format in openpyxl

If you are on Windows, try

value = datetime.date.strftime(sh['A1'].value, "%#d-%b-%y")

If on Linux, try

value = datetime.date.strftime(sh['A1'].value, "%-d-%b-%y")

Output:

'1-Jan-22'


Related Topics



Leave a reply



Submit