Xlsx Compressed by Rubyzip Not Readable by Excel

xlsx compressed by rubyzip not readable by Excel

There are a number of constraints that the OOXML format imposes on the use of Zip in order for the packages to be conformant. For example, the only compression method permitted in the package is DEFLATE.

You might want to check the specification for OPC packages (which .XSLX files are) in Annex C of the standard available here (Zip), and then ensure that the rubyzip library is not doing anything that is not permitted (such as using the IMPLODE compression method).

Which zipping library should I use to properly assemble a valid XLSX file in Objective-C?

I found the answer upon doing some research and also having a one to one correspondence with Objective-Zip's developer, flyingdolphinstudio.

First of all, Objective-Zip uses DEFLATE as the default compression method. I also confirmed this with the developer, who told me that using ZipCompressionLevelDefault, ZipCompressionLevelFastest or ZipCompressionLevelBest for the argument compressionLevel: will guarantee a DEFLATE compression.

So, the problem is coming from the mode: argument, which is ZipFileModeAppend in my case. It seems that MiniZip does not have a method to delete the files inside a zip file and that's why I am not overwriting the existing file, but adding a new one. To make it more clear, take a look at how my xl/worksheets folder look like after zipping it using Objective-Zip:
worksheets folder

So, the only way to create a valid XLSX container is to create the zip file from scratch, by adding all the files and also keeping the directory/file structure intact.

I hope this experience would help somebody out.

Parsing XLS and XLSX (MS Excel) files with Ruby?

Just found roo, that might do the job - works for my requirements, reading a basic spreadsheet.

How to properly assemble a valid xlsx file from its internal sub-components?

In answer to your questions:

  1. XLSX is just a collection of XML files in a zip container. There is no other magic.
  2. If you decompress/unzip a valid XLSX files and then recompress/zip it and you can't read the resulting output then the problem is generally with the files being rezipped or, less likely, the zipping software. The main thing to check is that the directory structure was maintained in the zip file.

Example of the contents of an xlsx file:

unzip -l example.xlsx
Archive: example.xlsx
Length Date Time Name
-------- ---- ---- ----
769 10-15-14 09:23 xl/worksheets/sheet1.xml
550 10-15-14 09:22 xl/workbook.xml
201 10-15-14 09:22 xl/sharedStrings.xml
...

I regularly unzip XLSX files, make minor changes for testing and re-zip them without any issue.

Update: The important thing is to avoid zipping the parent directory. Here is an example using the zip system utility on Linux or the OS X:

# Unzip an xlsx file into a directory.
unzip example.xlsx -d newdir

# Make some valid changes to the files.
cd newdir/
vi xl/worksheets/sheet1.xml

# Rezip the files *FROM* the unzipped directory.
# Note: you could also re-zip to the original file if required.
find . -type f | xargs zip ../newfile.xlsx

# Check the file looks okay.
cd ..
unzip -l newfile.xlsx
xdg-open newfile.xlsx

rubyzip file order

I have traced this to an issue in the rubyzip library, whereby the entries array was not being sorted prior to being written to the central directory, but unzip was dependent on this order.

Fixed, and sent a pull request upstream.



Related Topics



Leave a reply



Submit