Pandas Cannot Open an Excel (.Xlsx) File

I am unable to open the excel file with pandas,

have you tried passing engine="openpyxl" argument?

P.S this is where I found the answer https://stackoverflow.com/a/50815107

Python: Pandas read_excel cannot open .xls file, xlrd not supported

Your file is not a .xls, I insist! :)

With Pandas 1.1.5 and xlrd 2.1.0

Rename Presentaciones.xls to Presentaciones.xlsx.

import pandas as pd
# Use openpyxl.
df = pd.read_excel(r'X:...\Presentaciones.xlsx', engine='openpyxl')
print(df)

Enjoy! :)

More info

How do I know that your file is a fake .xls and a very real .xlsx?
Because openpyxl doesn't work with xls files.

import pandas as pd
df = pd.read_excel(r'X:...\test.xls', engine='openpyxl')
/*
ERROR:
InvalidFileException: openpyxl does not support the old .xls file format,
please use xlrd to read this file, or convert it to the more recent .xlsx file format.
*/

And trying to simply rename test.xls to test.xlsx does not work either!

import pandas as pd
df = pd.read_excel(r'X:...\test.xlsx', engine='openpyxl')
/*
Error:
OSError: File contains no valid workbook part
*/

History

Beware, the .xlsx extension (detected by pandas) means there may be scripts in this file. Sometimes the extension can lie, so be careful!

The reason why panda stopped supporting xlsx files is that those files are a security hazard and no one was maintaining this part of the code.

PANDAS & glob - Excel file format cannot be determined, you must specify an engine manually

Found it. When an excel file is opened for example by MS excel a hidden temporary file is created in the same directory:

~$datasheet.xlsx

So, when I run the code to read all the files from the folder it gives me the error:

Excel file format cannot be determined, you must specify an engine manually.

When all files are closed and no hidden temporary files ~$filename.xlsx in the same directory the code works perfectly.



Related Topics



Leave a reply



Submit