Ioerror: Can't Read Data (Can't Open Directory) - Missing Gzip Compression Filter

IOError: Can't read data (Can't open directory) - Missing gzip compression filter

Can't just comment - reputation too low.

I had the same issue, simply ran "conda update anaconda" and the problem is gone.

HDF5 viewer for LZF compressed arrays

While h5py comes with LZF, HDF5 itself is not generally distributed or compiled with LZF.
Instead, you can use gzip, which is included with all HDF5 versions and so can be opened on any system:

dset1 = f.create_dataset(r'/path/to/arrays/array_1', data=data, 
compression='gzip')

HDFView can open arrays compressed with gzip.

Additionally, if you use gzip, you can use compression_opts to set the compression level (an integer between 0 and 9):

dset1 = f.create_dataset(r'/path/to/arrays/array_1', data=data, 
compression='gzip', compression_opts=9)

why does hdf5 file size increase dramatically when I segment the data to 30 smaller dataframes with 30 different keys

I created some simple tests, and discovered some interesting behavior.

  1. First, I created some data to mimic your description and saw a 11x
    increase in file size going from 1 DF to 30 DFs. So, clearly something's going on...(You will have to provide come code that replicates the 40x increase.)
  2. Next using the same dataframes above I created 2 uncompressed files -- I did not include the compression parameters: complib='blosc',complevel=9. As expected, the uncompressed files were larger, but the increase from 1 DF to 30 DFs was much lower (only 65% increase).

Pandas Results



Leave a reply



Submit