How to Decompress Gzip String in Ruby

How to decompress Gzip string in ruby?

The above method didn't work for me.

I kept getting incorrect header check (Zlib::DataError) error. Apparently it assumes you have a header by default, which may not always be the case.

The work around that I implemented was:

require 'zlib'
require 'stringio'
gz = Zlib::GzipReader.new(StringIO.new(resp.body.to_s))
uncompressed_string = gz.read

Compress Gzip string in Ruby

Different compressors, different versions of the same compressor, or the same version of the same compressor with different settings, can and often will produce different output for the same input, even if they all use the same compressed data format (e.g. deflate). The only thing guaranteed is that when you decompress, you get exactly the same thing back you started with. In fact, that's really all you need guaranteed. Why do you want exactly the same compressed stream?

As noted by Ron Warholic, you wouldn't even want to get back to the same compressed output from .NET's broken deflate implementation prior to .NET 4.5. Since .NET 2.0 used its own unique, broken, deflate implementation, you cannot duplicate it with ruby, which uses zlib.

Also as noted by Ron Warholic, ruby and .NET 4.5 or later both use zlib, and so should both produce the same compressed output with the same compression level selected. Though that is not assured forever, since a new version of zlib may produce different output, and one of ruby or .NET might update to it while the other does not. Also as noted below, you do not have direct control over the compression level with .NET's classes.

If it's not possible to get it to the exact original, what would be
the most standardized compression, by which I mean general and that
would be able to be decompressed in the same way that the original
was?

Any correct implementation of lossless compression and decompression will have this property. You will always get back to the exact original, regardless of how the compressed data may differ. There is no "most standardized compression".

Your Zlib::Inflate.new(-Zlib::MAX_WBITS) is expecting a raw deflate stream, with no header or trailer. So you would need to produce that on the C# side.

It is not clear from the .NET documentation whether the DeflateStream class compresses to the deflate format or the zlib format (where the latter is the deflate format with a zlib wrapper, consisting of two prefix bytes and four postfix bytes for data integrity checking). If it compresses to the deflate format, then it will be compatible with your Zlib::Inflate.new(-Zlib::MAX_WBITS). If it compresses to the zlib format, then it would be compatible with Zlib::Inflate.new(Zlib::MAX_WBITS) (i.e. without the minus sign). Or you can delete the first two bytes and last four bytes to get back to a deflate stream.

The DeflateStream class in .NET is a little odd in that its CompressionLevel is an enum with only three options, instead of the ten levels provided by zlib (0..9). The three options are Optimal, Fastest, and NoCompression. The last must be 0, the first is probably 9, and the middle one might be 1 or 3. In any case, there is no option for the default compression level! That level (6) is a very good balance of compression vs. time.

You might want to consider using DotNetZip instead. It provides a complete interface to zlib, so that you can specify exactly what you want to do, and know what will happen.

Ruby zlib Library Very Slow to Decompress gzip File

I'm not sure what's going on there (I reproduced the slowness only with a highly compressed gzip file), but decompressing all at once is faster, something like this:

def decompress(io, int_size = 3)
array = Array.new(262144)
i = 0
io.rewind
gz = Zlib::GzipReader.new(io)
dec = gz.read
seq = StringIO.new(dec, "rb")
until seq.eof?
buffer = seq.read(int_size)
array[i] = buffer.unpack('C*').inject { |r, n| r << 8 | n }
i += 1
end
array
end

Faster still would be to use map instead of a loop:

def decompress(io, int_size = 3)
io.rewind
gz = Zlib::GzipReader.new(io)
dec = gz.read
dec.unpack('C*').each_slice(int_size).to_a.map {|t| t.inject {|r,n| r << 8 | n}}
end

How to decompress in node data that was compressed in ruby with gzip?

I am not a Ruby dev, so I will write the Ruby part in more or less pseudo code.

Ruby code (run online at https://repl.it/BoRD/0)

require 'json'
require 'zlib'

car = {:make => "bmw", :year => "2003"}

car_str = car.to_json

puts "car_str", car_str

car_byte = Zlib::Deflate.deflate(car_str)
# If you try to `puts car_byte`, it will crash with the following error:
# "\x9C" from ASCII-8BIT to UTF-8
#(repl):14:in `puts'
#(repl):14:in `puts'
#(repl):14:in `initialize'

car_str_dec = Zlib::Inflate.inflate(car_byte)

puts "car_str_dec", car_str_dec
# You can check that the decoded message is the same as the source.

# somehow send `car_byte`, the encoded bytes to RabbitMQ.

Node code

var zlib = require('zlib');

// somehow get the message from RabbitMQ.
var data = '...';

zlib.inflate(data, function (err, buffer) {
if (err) {
// Handle the error.
} else {
// If source didn't have any encoding,
// no need to specify the encoding.
console.log(buffer.toString());
}
});

I also suggest you to stick with async functions in Node instead of their sync alternatives.

Ruby parsing gzip binary string

while c
io = StringIO.new(c)
gz = Zlib::GzipReader.new(io)
gz.each do | l |
puts l
end
c = gz.unused # take unprocessed portion of the string as the next archive
end

See ruby-doc.

Create in-memory only gzip

You're actually gzipping normal.to_s(which is something like "#<File:0x007f53c9b55b48>") in the following code.

# Files
normal = File.new('chunk0.nbt')

# Try to create gzip in program
make_gzip normal

You should read the content of the file, and make_gzip on the content:

make_gzip normal.read

As I commented, the make_gzip can be updated:

def self.make_gzip(data)
gz = Zlib::GzipWriter.new(StringIO.new)
gz << data
gz.close.string
end

native ruby methods for compressing/encrypt strings?

From http://ruby-doc.org/stdlib/libdoc/zlib/rdoc/classes/Zlib.html

  # aka compress
def deflate(string, level)
z = Zlib::Deflate.new(level)
dst = z.deflate(string, Zlib::FINISH)
z.close
dst
end

# aka decompress
def inflate(string)
zstream = Zlib::Inflate.new
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end

Encryption from http://snippets.dzone.com/posts/show/991

require 'openssl'
require 'digest/sha1'
c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
c.encrypt
# your pass is what is used to encrypt/decrypt
c.key = key = Digest::SHA1.hexdigest("yourpass")
c.iv = iv = c.random_iv
e = c.update("crypt this")
e << c.final
puts "encrypted: #{e}\n"
c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
c.decrypt
c.key = key
c.iv = iv
d = c.update(e)
d << c.final
puts "decrypted: #{d}\n"

Zlib in Ruby to uncompress .gz

Zlib::GzipReader works like most IO-like classes do in Ruby. You have an open call, and when you pass a block to it, the block will receive the IO-like object. Think of it is convenient way of doing something with a file or resource for the duration of the block.

But that means that in your example gz is an IO-like object, and not actually the contents of the gzip file, as you expect. You still need to read from it to get to that. The simplest fix would then be:

g.write(gz.read)

Note that this will read the entire contents of the uncompressed gzip into memory.

If all you're really doing is copying from one file to another, you can use the more efficient IO.copy_stream method. Your example might then look like:

Zlib::GzipReader.open('PRIDE_Exp_Complete_Ac_1015.xml.gz') do | input_stream |
File.open("PRIDE_Exp_Complete_Ac_1015.xml", "w") do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end

Behind the scenes, this will try to use the sendfile syscall available in some specific situations on Linux. Otherwise, it will do the copying in fast C code 16KB blocks at a time. This I learned from the Ruby 1.9.1 source code.



Related Topics



Leave a reply



Submit