Python Xlrd Unsupported Format, or Corrupt File.

python xlrd unsupported format, or corrupt file.

You say:

The file doesn't seem to be corrupted or of a different format.

However as the error message says, the first 8 bytes of the file are '<table r' ... that is definitely not Excel .xls format. Open it with a text editor (e.g. Notepad) that won't take any notice of the (incorrect) .xls extension and see for yourself.

Error: Unsupported format, or corrupt file: Expected BOF record

The error message relates to the BOF (Beginning of File) record of an XLS file. However, the example shows that you are trying to read an XLSX file.

There are 2 possible reasons for this:

  1. Your version of xlrd is old and doesn't support reading xlsx files.
  2. The XLSX file is encrypted and thus stored in the OLE Compound Document format, rather than a zip format, making it appear to xlrd as an older format XLS file.

Double check that you are in fact using a recent version of xlrd. Opening a new XLSX file with data in just one cell should verify that.

However, I would guess the you are encountering the second condition and that the file is encrypted since you state above that you are already using xlrd version 0.9.2.

XLSX files are encrypted if you explicitly apply a workbook password but also if you password protect some of the worksheet elements. As such it is possible to have an encrypted XLSX file even if you don't need a password to open it.

Update: See @BStew's, third, more probable, answer, that the file is open by Excel.

Python: Unsupported format, or corrupt file

As mentioned by Slybot, this is not a real xls file.

If you inspect the contents in a plain text editor, or a hex editor, the header starts:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">

which confirms this is an xml document, and not an Office 2007 zipped xlsx office document.

Your next steps depend on whether you have Excel installed on the machine that will be running this code or not, and if not, what other libraries you have access to and are willing to pay for - Slybot has mentioned aspose for example.

The easiest solution - Excel

If you are running this on a Windows machine with Excel installed, you have the free and capable option of automating the operation of opening Excel and saving as xlsx. This is by using Win32com module, described in this answer:

Attempting to Parse an XLS (XML) File Using Python

Alternatively, save your Excel styled XML as xlsx with Workbook.SaveAs method using win32com (only for Windows users) and read in with pandas.read_excel skipping appropriate rows.

The XML solution

You could read in the raw XML and digest it. The relevant nodes are:

<ss:Workbook>  
<ss:Worksheet ss:Name="Holdings">
<ss:Table>
<ss:Row>
<ss:Cell ss:StyleID="Left">
<ss:Data ss:Type="String">iShares MSCI World SRI UCITS ETF</ss:Data>
The Third-party library solution

I am not familiar with any libraries which provide this functionality, and can't advise on this option.



Related Topics



Leave a reply



Submit