Can Multiple .Gz Files Be Combined Such That They Extract into a Single File

Can multiple .gz files be combined such that they extract into a single file?

Surprisingly, this is actually possible.

The GNU zip man page states: multiple compressed files can be concatenated. In this case, gunzip will extract all members at once.

Example:

You can build the zip like this:

echo 1 > 1.txt ; echo 2 > 2.txt; echo 3 > 3.txt;
gzip 1.txt; gzip 2.txt; gzip 3.txt;
cat 1.txt.gz 2.txt.gz 3.txt.gz > all.gz

Then extract it:

gunzip -c all.gz > all.txt

The contents of all.txt should now be:

1
2
3

Which is the same as:

cat 1.txt 2.txt 3.txt

And - as you requested - "gunzip will extract all members at once".

Is there a GZIP merger that merges two GZIP files without decompressing them?

Of course, cat a.gz b.gz > c.gz doesn't work.

Actually, it works just fine. I just tested it. It's even documented (sort of) in the gzip man page.

   Multiple  compressed  files  can  be concatenated. In this case, gunzip
will extract all members at once. For example:

gzip -c file1 > foo.gz
gzip -c file2 >> foo.gz

Then

gunzip -c foo

is equivalent to

cat file1 file2

Combining large number of files in one file using terminal

you need first to unzip before to remove the 15 first lines

for i in neutral_*.msOut.gz
do
zcat $i | head -15 $i > neutral.msOut
break
done

for i in neutral_*.msOut.gz
do
zcat $i | sed -e 1,15d >> neutral.msOut
done
  • the first loop just extracts one time the first 15 lines in one file to have them one time in the result file, the loop can be simplified knowing the name of one of the files to just extract the first 15 lines of it. If you do not want to have that header in the produced file just remove that loop
  • the second loop adds all except the first 15 lines of each files
  • that does not require to have a given version of tail (see remark in deleted answer of @kabanus saying tail does not have a -q option on osx )
  • may be you need to zip neutral.msOut after the two loops

How to decompress .gz file containing multiple files

What you're asking doesn't quite make sense. Do you perhaps actually have a .tar.gz or .tgz file?

Per the Gzip home page (https://www.gzip.org/):

gzip is a single-file/stream lossless data compression utility, where
the resulting compressed file generally has the suffix .gz.

A Gzip file is simply a compressed stream of bytes. In terms of files, decompressing a .gz file always leads to a single file. The format provides no provision for breaking the contents of the resulting file into multiple files or assigning names to those files.

There are other file formats that do store multiple files. Tar is the usual one that is used for this in the same circles where Gzip is used. Tar files usually have a .tar extension. But when a Tar is created, it is often immediately compressed with Gzip. So you often find files in the wild that have the extension .tar.gz. This means that one or more files were collected into a single Tar file, and then that file was compressed using Gzip. This scenario is so common, that a single extension, .tgz, is often used as a shortcut for .tar.gz. Also, Tar itself can do Gzip compression, so to create a .tgz file from a directory of files, you can do this:

tar -czf archive.tgz somedirectoryname

If you in fact have a .tar.gz or .tgz file, then the way to decompress and expand that file into multiple files is to first decompress with Gzip and then extract the individual files with Tar. Tar can do the Gzip decompression itself, so all you need to do to decompress a .tgz file is:

tar -xzf archive.tgz

This will produce whatever files and directory structure was used to create the .tgz file.

If you really have just a .gz file, then I'm not sure what you have, if you're expecting it to expand naturally into multiple files. A .gz file simply isn't able to preserve on its own any notion of multiple files. My guess is that what you have are Gzip compressed individual log files.

Python unzip multiple .gz files

First, get the list of all files.

files = ['/path/to/foo.txt.gz.001', '/path/to/foo.txt.gz.002', '/path/to/foo.txt.gz.003']

Then iterate over each file and append to a result file.

with open('./result.gz', 'ab') as result:  # append in binary mode
for f in files:
with open(f, 'rb') as tmpf: # open in binary mode also
result.write(tmpf.read())

Then extract is using zipfile lib. You could use tempfile to avoid handle with temporary zip file.

How to combine .tab.gz files into one gz file not repeating the colunm header?

Using awk:

$ awk 'FNR>1||NR==1' <(gunzip -c a.tab.gz) <(gunzip -c b.tab.gz) | gzip > c.tab.gz

Output inside c.tab.gz:

col1      col2      col3
1 2 3
1 4 6

Edit: Another awk:

$ zcat [ab].tab.gz | awk 'NR==1{h=$0;print}$0!=h' | gzip > c.tab.gz

which excludes the records which are identical to the first record of the first file uncompressed - which might cause problems if you have headers in the data.

gzip or deflate files can it be merged together? for api

gzip only supports a single file internally. It's not a container format, just a straight deflation. There's no provision to maintain a directory of where one file ends and another starts, let along the file metadata (filename, creation date, size, etc...)

You can use .tar to glom multiple files together, and then gzip the .tar file. Or use zip to wrap multiple files (of any type) into a single .zip file. Since you say you'd be zipping .gz'd files, I'd suggest just using zip as a container, with compression disabled. The few bytes you'd save by recompressing the .gz files won't be worth the CPU overhead to do the extra compression run.



Related Topics



Leave a reply



Submit