python xlrd unsupported format, or corrupt file.
You say:
The file doesn't seem to be corrupted or of a different format.
However as the error message says, the first 8 bytes of the file are '<table r'
... that is definitely not Excel .xls
format. Open it with a text editor (e.g. Notepad) that won't take any notice of the (incorrect) .xls
extension and see for yourself.
Error: Unsupported format, or corrupt file: Expected BOF record
The error message relates to the BOF (Beginning of File) record of an XLS file. However, the example shows that you are trying to read an XLSX file.
There are 2 possible reasons for this:
- Your version of xlrd is old and doesn't support reading xlsx files.
- The XLSX file is encrypted and thus stored in the OLE Compound Document format, rather than a zip format, making it appear to xlrd as an older format XLS file.
Double check that you are in fact using a recent version of xlrd. Opening a new XLSX file with data in just one cell should verify that.
However, I would guess the you are encountering the second condition and that the file is encrypted since you state above that you are already using xlrd version 0.9.2.
XLSX files are encrypted if you explicitly apply a workbook password but also if you password protect some of the worksheet elements. As such it is possible to have an encrypted XLSX file even if you don't need a password to open it.
Update: See @BStew's, third, more probable, answer, that the file is open by Excel.
Python: Unsupported format, or corrupt file
As mentioned by Slybot, this is not a real xls file.
If you inspect the contents in a plain text editor, or a hex editor, the header starts:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
which confirms this is an xml document, and not an Office 2007 zipped xlsx office document.
Your next steps depend on whether you have Excel installed on the machine that will be running this code or not, and if not, what other libraries you have access to and are willing to pay for - Slybot has mentioned aspose
for example.
If you are running this on a Windows machine with Excel installed, you have the free and capable option of automating the operation of opening Excel and saving as xlsx. This is by using Win32com module, described in this answer:
Attempting to Parse an XLS (XML) File Using Python
The XML solutionAlternatively, save your Excel styled XML as xlsx with Workbook.SaveAs method using
win32com
(only for Windows users) and read in with pandas.read_excel skipping appropriate rows.
You could read in the raw XML and digest it. The relevant nodes are:
<ss:Workbook>
<ss:Worksheet ss:Name="Holdings">
<ss:Table>
<ss:Row>
<ss:Cell ss:StyleID="Left">
<ss:Data ss:Type="String">iShares MSCI World SRI UCITS ETF</ss:Data>
The Third-party library solutionI am not familiar with any libraries which provide this functionality, and can't advise on this option.
Related Topics
How to Make a Tkinter Label Background Transparent
Check Json Data Is None in Python
How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python
How to Map True/False to 1/0 in a Pandas Dataframe
How to Check Whether a Number Is Divisible by Another Number
How to Delete Tkinter Widgets from a Window
Convert Number Strings With Commas in Pandas Dataframe to Float
How to Restart a Program Based on User Input
Discord Bot Messaging a User With a Specific User Id
Finding Out Who Got the Highest Mark Among the Students
Importing Modules from Parent Folder
Python - How to Pad the Output of a MySQL Table
How to Check List Containing Nan