Compressed Output Differs from Go to Ruby Implementation

Compressed output differs from Go to Ruby Implementation

The deflate algorithm as defined in RFC 1951 (which is used in the zlib format defined by RFC 1950 and also in gzip defined by RFC 1952) allows variations in the implementation which might lead to different results when compressing. But these results will still decompress to the same value. This allows for a tradeoff of compression time to compression level and makes also programs like zopfli possible which achieve better compression than the original zlib library (at the cost of significantly larger compression time).

Go uses its own implementation of the deflate algorithm written in Go while ruby uses the zlib library. This is the reason your examples create different compressed output on the same input. But if you take the output from the Go or Ruby program and decompress (no matter if done with Ruby or Go or whatever standard-conforming implementation) it again it will result in exactly the same value.

Why do gzip of Java and Go get different results?

From RFC 1952, the GZip file header is structured as:

+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+

Looking at the output you've provided, we have:

                          |    Java |          Go
ID1 | 31 | 31
ID2 | 139 | 139
CM (compression method) | 8 | 8
FLG (flags) | 0 | 0
MTIME (modification time) | 0 0 0 0 | 0 9 110 136
XFL (extra flags) | 0 | 0
OS (operating system) | 0 | 255

So we can see that Go is setting the modification time field of the header, and setting the operating system to 255 (unknown) rather than 0 (FAT file system). In other respects they indicate that the file is compressed in the same way.

In general these sorts of differences are harmless. If you want to determine if two compressed files are the same, then you should really compare the decompressed versions of the files though.

Compress Gzip string in Ruby

Different compressors, different versions of the same compressor, or the same version of the same compressor with different settings, can and often will produce different output for the same input, even if they all use the same compressed data format (e.g. deflate). The only thing guaranteed is that when you decompress, you get exactly the same thing back you started with. In fact, that's really all you need guaranteed. Why do you want exactly the same compressed stream?

As noted by Ron Warholic, you wouldn't even want to get back to the same compressed output from .NET's broken deflate implementation prior to .NET 4.5. Since .NET 2.0 used its own unique, broken, deflate implementation, you cannot duplicate it with ruby, which uses zlib.

Also as noted by Ron Warholic, ruby and .NET 4.5 or later both use zlib, and so should both produce the same compressed output with the same compression level selected. Though that is not assured forever, since a new version of zlib may produce different output, and one of ruby or .NET might update to it while the other does not. Also as noted below, you do not have direct control over the compression level with .NET's classes.

If it's not possible to get it to the exact original, what would be
the most standardized compression, by which I mean general and that
would be able to be decompressed in the same way that the original
was?

Any correct implementation of lossless compression and decompression will have this property. You will always get back to the exact original, regardless of how the compressed data may differ. There is no "most standardized compression".

Your Zlib::Inflate.new(-Zlib::MAX_WBITS) is expecting a raw deflate stream, with no header or trailer. So you would need to produce that on the C# side.

It is not clear from the .NET documentation whether the DeflateStream class compresses to the deflate format or the zlib format (where the latter is the deflate format with a zlib wrapper, consisting of two prefix bytes and four postfix bytes for data integrity checking). If it compresses to the deflate format, then it will be compatible with your Zlib::Inflate.new(-Zlib::MAX_WBITS). If it compresses to the zlib format, then it would be compatible with Zlib::Inflate.new(Zlib::MAX_WBITS) (i.e. without the minus sign). Or you can delete the first two bytes and last four bytes to get back to a deflate stream.

The DeflateStream class in .NET is a little odd in that its CompressionLevel is an enum with only three options, instead of the ten levels provided by zlib (0..9). The three options are Optimal, Fastest, and NoCompression. The last must be 0, the first is probably 9, and the middle one might be 1 or 3. In any case, there is no option for the default compression level! That level (6) is a very good balance of compression vs. time.

You might want to consider using DotNetZip instead. It provides a complete interface to zlib, so that you can specify exactly what you want to do, and know what will happen.

how do I use zlib to compress in C and uncompress in golang

Comparing compressed data tells you nothing. Different compressors, or different versions of the same compressor, or the same version used with different settings, can all give different compressed output for the same input. What actually matters for a lossless compressor is whether you can decompress to the original data.

The problem with your first example is that it is not complete. (The second example is complete and correct.) The first example ends in the middle of a deflate block. There is an error in your usage of zlib, either in managing the resulting data or not properly requesting the completion of the compression.

Node HMAC results differ from both Ruby and Java

Looks like the issue was a conflict in requirements. They wanted a \n to separate parameters and to have it included on the last pair as well. But they also wanted all white space trimmed.

If the trim was done at the end, it was removing that last \n. The trim needed to be done while building the pairs, while leaving \n on all pairs, including the last one.

Why using unix-compress and go compress/lzw produce different files, not readable by the other decoder?

A .Z file does not only contain LZW compressed data, there is also a 3-bytes header that the Go LZW code does not generate because it is meant to compress data, not generate a Z file.

Ruby zlib deflate and inflate not working as intended

Is your hoge.txt file in your current directory when you run your program? The big error warning indicating that the file cannot be found would be the first thing that I would try to resolve before assuming that the compression and decompression is not working. I think that you will find that the "compressed" file cannot be read because it is an empty file.


After you resolve your program's inability to find the hoge.txt file, you are still going to have problems with your output. Deflate does not create a gzip file; it only does the compression. There is more that goes into a gzip file than just the compressed data. There is also header and footer information. I would recommend that you use the GzipWriter class instead of the Deflate class.

require "zlib"

file_name = "hoge.txt"
compressed_file = File.open(file_name +".gz", "w+")

zd = Zlib::GzipWriter.new(compressed_file)

zd << File.read(file_name)
compressed_file.close

Ruby: uncompress zlib-wrapped deflate data

The documentation indicates that the Ruby inflate class will decompress the output of compress2(), which is in the zlib format. I just tried it, and it works fine. Your compressed data may not be making it over to Ruby intact.



Related Topics



Leave a reply



Submit