Splitting Gzipped Logfiles Without Storing the Ungzipped Splits on Disk

Splitting gzipped logfiles without storing the ungzipped splits on disk

You can use the split --filter option as explained in the manual e.g.

zcat biglogfile.gz | split -l500000 --filter='gzip > $FILE.gz'

Edit: not aware when --filter option was introduced but according to comments, it is not working in core utils 8.4.

download part of gzipped text file

No, it is not possible. You need to decompress all of the data in a gzip file in order to get the uncompressed data at the end.

Split a large, compressed file into multiple outputs using AWK and BASH

This little perl script does the job nicely

  • keeping all destination files open for performance
  • doing error elementary handling
  • Edit now also pipes output through gzip on the fly

There is a bit of a kludge with $fh because apparently using the hash entry directly doesn't work

#!/usr/bin/perl
use strict;
use warnings;

my $suffix = ".txt.gz";

my %pipes;
while (my ($id, $line) = split /\t/,(<>),2)
{
exists $pipes{$id}
or open ($pipes{$id}, "|gzip -9 > '$id$suffix'")
or die "can't open/create $id$suffix, or cannot spawn gzip";

my $fh = $pipes{$id};
print $fh $line;
}

print STDERR "Created: " . join(', ', map { "$_$suffix" } keys %pipes) . "\n"

Oh, use it like

zcat input.gz | ./myscript.pl


Related Topics



Leave a reply



Submit