I am unable to open the excel file with pandas,
have you tried passing engine="openpyxl"
argument?
P.S this is where I found the answer https://stackoverflow.com/a/50815107
Python: Pandas read_excel cannot open .xls file, xlrd not supported
Your file is not a .xls, I insist! :)
With Pandas 1.1.5 and xlrd 2.1.0
Rename Presentaciones.xls
to Presentaciones.xlsx
.
import pandas as pd
# Use openpyxl.
df = pd.read_excel(r'X:...\Presentaciones.xlsx', engine='openpyxl')
print(df)
Enjoy! :)
More info
How do I know that your file is a fake .xls
and a very real .xlsx
?
Because openpyxl
doesn't work with xls
files.
import pandas as pd
df = pd.read_excel(r'X:...\test.xls', engine='openpyxl')
/*
ERROR:
InvalidFileException: openpyxl does not support the old .xls file format,
please use xlrd to read this file, or convert it to the more recent .xlsx file format.
*/
And trying to simply rename test.xls
to test.xlsx
does not work either!
import pandas as pd
df = pd.read_excel(r'X:...\test.xlsx', engine='openpyxl')
/*
Error:
OSError: File contains no valid workbook part
*/
History
Beware, the .xlsx
extension (detected by pandas) means there may be scripts in this file. Sometimes the extension can lie, so be careful!
The reason why panda stopped supporting xlsx
files is that those files are a security hazard and no one was maintaining this part of the code.
PANDAS & glob - Excel file format cannot be determined, you must specify an engine manually
Found it. When an excel file is opened for example by MS excel a hidden temporary file is created in the same directory:
~$datasheet.xlsx
So, when I run the code to read all the files from the folder it gives me the error:
Excel file format cannot be determined, you must specify an engine manually.
When all files are closed and no hidden temporary files ~$filename.xlsx
in the same directory the code works perfectly.
Related Topics
Convert Pandas Dataframe to Nested JSON
How to Remove Nan Value While Combining Two Column in Panda Data Frame
Multi-Level Defaultdict with Variable Depth
Matplotlib Y Axis Values Are Not Ordered
Replace() Method Not Working on Pandas Dataframe
Df.Append() Is Not Appending to the Dataframe
Pandas Select from Dataframe Using Startswith
How to Debug in Django, the Good Way
Elegant Python Function to Convert Camelcase to Snake_Case
Multiprocessing: Understanding Logic Behind 'Chunksize'
What Is the Quickest Way to Http Get in Python
Calling Class Staticmethod Within the Class Body
Plotting a 2D Heatmap with Matplotlib
Object of Custom Type as Dictionary Key
Python CSV Error: Line Contains Null Byte