Can multiple .gz files be combined such that they extract into a single file?
Surprisingly, this is actually possible.
The GNU zip man page states: multiple compressed files can be concatenated. In this case, gunzip will extract all members at once.
Example:
You can build the zip like this:
echo 1 > 1.txt ; echo 2 > 2.txt; echo 3 > 3.txt;
gzip 1.txt; gzip 2.txt; gzip 3.txt;
cat 1.txt.gz 2.txt.gz 3.txt.gz > all.gz
Then extract it:
gunzip -c all.gz > all.txt
The contents of all.txt
should now be:
1
2
3
Which is the same as:
cat 1.txt 2.txt 3.txt
And - as you requested - "gunzip will extract all members at once".
Is there a GZIP merger that merges two GZIP files without decompressing them?
Of course, cat a.gz b.gz > c.gz doesn't work.
Actually, it works just fine. I just tested it. It's even documented (sort of) in the gzip man page.
Multiple compressed files can be concatenated. In this case, gunzip
will extract all members at once. For example:
gzip -c file1 > foo.gz
gzip -c file2 >> foo.gz
Then
gunzip -c foo
is equivalent to
cat file1 file2
Combining large number of files in one file using terminal
you need first to unzip before to remove the 15 first lines
for i in neutral_*.msOut.gz
do
zcat $i | head -15 $i > neutral.msOut
break
done
for i in neutral_*.msOut.gz
do
zcat $i | sed -e 1,15d >> neutral.msOut
done
- the first loop just extracts one time the first 15 lines in one file to have them one time in the result file, the loop can be simplified knowing the name of one of the files to just extract the first 15 lines of it. If you do not want to have that header in the produced file just remove that loop
- the second loop adds all except the first 15 lines of each files
- that does not require to have a given version of tail (see remark in deleted answer of @kabanus saying tail does not have a
-q
option on osx ) - may be you need to zip
neutral.msOut
after the two loops
How to decompress .gz file containing multiple files
What you're asking doesn't quite make sense. Do you perhaps actually have a .tar.gz or .tgz file?
Per the Gzip home page (https://www.gzip.org/):
gzip is a single-file/stream lossless data compression utility, where
the resulting compressed file generally has the suffix .gz.
A Gzip file is simply a compressed stream of bytes. In terms of files, decompressing a .gz file always leads to a single file. The format provides no provision for breaking the contents of the resulting file into multiple files or assigning names to those files.
There are other file formats that do store multiple files. Tar is the usual one that is used for this in the same circles where Gzip is used. Tar files usually have a .tar extension. But when a Tar is created, it is often immediately compressed with Gzip. So you often find files in the wild that have the extension .tar.gz. This means that one or more files were collected into a single Tar file, and then that file was compressed using Gzip. This scenario is so common, that a single extension, .tgz, is often used as a shortcut for .tar.gz. Also, Tar itself can do Gzip compression, so to create a .tgz file from a directory of files, you can do this:
tar -czf archive.tgz somedirectoryname
If you in fact have a .tar.gz or .tgz file, then the way to decompress and expand that file into multiple files is to first decompress with Gzip and then extract the individual files with Tar. Tar can do the Gzip decompression itself, so all you need to do to decompress a .tgz file is:
tar -xzf archive.tgz
This will produce whatever files and directory structure was used to create the .tgz file.
If you really have just a .gz file, then I'm not sure what you have, if you're expecting it to expand naturally into multiple files. A .gz file simply isn't able to preserve on its own any notion of multiple files. My guess is that what you have are Gzip compressed individual log files.
Python unzip multiple .gz files
First, get the list of all files.
files = ['/path/to/foo.txt.gz.001', '/path/to/foo.txt.gz.002', '/path/to/foo.txt.gz.003']
Then iterate over each file and append to a result file.
with open('./result.gz', 'ab') as result: # append in binary mode
for f in files:
with open(f, 'rb') as tmpf: # open in binary mode also
result.write(tmpf.read())
Then extract is using zipfile lib. You could use tempfile to avoid handle with temporary zip file.
How to combine .tab.gz files into one gz file not repeating the colunm header?
Using awk:
$ awk 'FNR>1||NR==1' <(gunzip -c a.tab.gz) <(gunzip -c b.tab.gz) | gzip > c.tab.gz
Output inside c.tab.gz
:
col1 col2 col3
1 2 3
1 4 6
Edit: Another awk:
$ zcat [ab].tab.gz | awk 'NR==1{h=$0;print}$0!=h' | gzip > c.tab.gz
which excludes the records which are identical to the first record of the first file uncompressed - which might cause problems if you have headers in the data.
gzip or deflate files can it be merged together? for api
gzip only supports a single file internally. It's not a container format, just a straight deflation. There's no provision to maintain a directory of where one file ends and another starts, let along the file metadata (filename, creation date, size, etc...)
You can use .tar to glom multiple files together, and then gzip the .tar file. Or use zip to wrap multiple files (of any type) into a single .zip file. Since you say you'd be zipping .gz'd files, I'd suggest just using zip as a container, with compression disabled. The few bytes you'd save by recompressing the .gz files won't be worth the CPU overhead to do the extra compression run.
Related Topics
Trouble with Vagrant - "404 - Not Found"
How to Check Character Encoding of a File in Linux
How to Let Users Run a Script with Root Permissions
Fork() and Stdout/Stderr to The Console from Child Processes
Calculate Total Disk I/O by a Single Process
Does File ".Bash_History" Always Record Every Command I Ever Issue
Is There a Libc in Kernel Space
Why Use G++ Instead of Gcc to Compile *.Cc Files
How to Remount The /Proc Filesystem in a Docker as a R/W System
Yocto Bitbake Script Not Displaying Echo Statement
You Don't Have Permission Error in Apache in Centos
Ssh Connection to Azure Vm with Terraform
"Invalid Arithmetic Operator" in Shell
Watermarking Video from The Linux Command Line
Autoconf Complains "C Compiler Cannot Create Executables" on Linux Mint