Fast Concatenation of Multiple Gzip Files

Can multiple .gz files be combined such that they extract into a single file?

Surprisingly, this is actually possible.

The GNU zip man page states: multiple compressed files can be concatenated. In this case, gunzip will extract all members at once.

Example:

You can build the zip like this:

echo 1 > 1.txt ; echo 2 > 2.txt; echo 3 > 3.txt;
gzip 1.txt; gzip 2.txt; gzip 3.txt;
cat 1.txt.gz 2.txt.gz 3.txt.gz > all.gz

Then extract it:

gunzip -c all.gz > all.txt

The contents of all.txt should now be:

1
2
3

Which is the same as:

cat 1.txt 2.txt 3.txt

And - as you requested - "gunzip will extract all members at once".

How to concat two or more gzip files/streams

Look at the RFC1951 and RFC1952

The format is simply a suites of members, each composed of three parts, an header, data and a trailer. The data part is itself a set of chunks with each chunks having an header and data part.

To simulate the effect of gzipping the result of the concatenation of two (or more files), you simply have to adjust the headers (there is a last chunk flag for instance) and trailer correctly and copying the data parts.

There is a problem, the trailer has a CRC32 of the uncompressed data and I'm not sure if this one is easy to compute when you know the CRC of the parts.

Edit: the comments in the gzjoin.c file you found imply that, while it is possible to compute the CRC32 without decompressing the data, there are other things which need the decompression.

Concatenate gzipped files with Python, on Windows

Just keep writing to the same file.

with open(..., 'wb') as wfp:
for fn in filenames:
with open(fn, 'rb') as rfp:
shutil.copyfileobj(rfp, wfp)

Is there a GZIP merger that merges two GZIP files without decompressing them?

Of course, cat a.gz b.gz > c.gz doesn't work.

Actually, it works just fine. I just tested it. It's even documented (sort of) in the gzip man page.

   Multiple  compressed  files  can  be concatenated. In this case, gunzip
will extract all members at once. For example:

gzip -c file1 > foo.gz
gzip -c file2 >> foo.gz

Then

gunzip -c foo

is equivalent to

cat file1 file2


Related Topics



Leave a reply



Submit