Recursively Listing The Contents of a Tar/Zip Archive

Recursively listing the contents of a tar/zip archive

Here a perl script that will list all files including recursion over zip and tar files:

#!/usr/bin/env perl

use strict;
use warnings;
use Archive::Extract;
use File::Temp;

my ($indent) = (0);

die qq|Usage: perl $0 <zip-file>\n| unless @ARGV == 1;

printf qq|%s\n|, $ARGV[0];
$indent += 2;
recursive_extract( shift );

exit 0;

sub recursive_extract {
my ($file) = @_;
my $tmpdir = File::Temp->newdir;

my $ae = Archive::Extract->new(
archive => $file,
);

$ae->extract( to => $tmpdir->dirname );

for my $f ( @{ $ae->files } ) {
printf qq|%s%s\n|, q| | x $indent, $f;
if ( $f =~ m/\.(?:zip|tar)\z/ ) {
$indent += 2;
recursive_extract( $f );
}
}

$indent -= 2;
}

Some drawbacks: It doesn't cache already processed files, so if there are identical compressed files, it will extract and read them again. And it will search for compressed files looking only in their extension, not their content. So it can be improved for anyone who need or want it.

Assuming following script is named script.pl, give the zip file as argument, running it like:

perl script.pl myzip.zip

And in my test it yields something like:

myzip.zip
f1
f2
f3
f4
mytar.tar
f5
f6
f7
f8
testtar.tar
f11
f12
f13
f14
testtar.tar
f11
f12
f13
f14
testzip.zip
fd
fd2

Listing the contents of zip files within a tar file

Starting with tar, you can get it to write to stdout with the -O command line option.

Next to the unzip part of the problem. The standard Linux infozip versions of zip/unzip don't support reading input from stdin. The funzip command, which is part of infozip can read from stdin, reads from stdin but it only supports uncompression. It doesn't have an option to list the contents.

Luckily Java jar files are really just zip files with a well-defined structure and the jar command can read from stdin and list the contents of the input file

So, assuming you have a tar file called my.tar that only contains zip files, something like this should list the names of the files in the embedded zip files

tar xOf my.tar | jar -t

Listing the content of a tar file or a directory only down to some level

tar tvf scripts.tar | awk -F/ '{if (NF<4) print }'

drwx------ glens/glens 0 2010-03-17 10:44 scripts/
-rwxr--r-- glens/www-data 1051 2009-07-27 10:42 scripts/my2cnf.pl
-rwxr--r-- glens/www-data 359 2009-08-14 00:01 scripts/pastebin.sh
-rwxr--r-- glens/www-data 566 2009-07-27 10:42 scripts/critic.pl
-rwxr-xr-x glens/glens 981 2009-12-16 09:39 scripts/wiki_sys.pl
-rwxr-xr-x glens/glens 3072 2009-07-28 10:25 scripts/blacklist_update.pl
-rwxr--r-- glens/www-data 18418 2009-07-27 10:42 scripts/sysinfo.pl

Make sure to note, that the number is 3+ however many levels you want, because of the / in the username/group. If you just do

tar tf scripts.tar | awk -F/ '{if (NF<3) print }'

scripts/
scripts/my2cnf.pl
scripts/pastebin.sh
scripts/critic.pl
scripts/wiki_sys.pl
scripts/blacklist_update.pl
scripts/sysinfo.pl

it's only two more.

You could probably pipe the output of ls -R to this awk script, and have the same effect.

How can I list the files in a zip archive without decompressing it?

Perreal's answer is right, but I recommend installing atool (look for it in your distribution's package manager). Then, for any kind of archive file, bzip2, gzip, tar... you have just one command to remember :

als archive_name

How to list the folders/files of a file.tar.gz file inside a file.tar

You may want to use and condition:

tar -xf abc.tar "abc.tar.gz" && tar -ztvf abc.tar.gz

Explanation:

For listing of files we use

If file is of type tar.gz:

tar -ztvf file.tar.gz

If file is of type tar:

tar -tvf file.tar

If file is of type tar.bz2:

tar -jtvf file.tar.bz2

You can also search for files in any of the above commands. e.g:

tar -tvf file.tar.bz2 '*.txt'

For extracting files we use

tar -xf file.tar

In these commands,

  • t: List the contents of an archive.
  • v: Verbosely list files processed (display detailed information).
  • z: Filter the archive through gzip so that we can open compressed
    (decompress) .gz tar file.
  • j: Filter archive through bzip2, use to decompress .bz2 files.
  • f filename: Use archive file called filename.
  • x: Extract all files from given tar, but when passed with a filename
    will extract only matching files

Finding a file within recursive directory of zip files

You can omit using find for single-level (or recursive in bash 4 with globstar) searches of .zip files using a for loop approach:

for i in *.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done

for recursive searching in bash 4:

shopt -s globstar
for i in **/*.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done

tar and recursive archiving

Try the next:

arch="archive.tar.gz"
while read -r -d $'\0' dir
do
(cd "$dir" && find . -maxdepth 1 -iregex '.*\.docx?' -print0 | tar --null -czf "$arch" -T - --remove-files)
#alternatively
#(cd "$dir" && shopt -s nocaseglob nullglob && tar --no-recursion -czf "$arch" *.doc *.docx --remove-files)
done < <(find . \( -ipath '*/test/*' -o -ipath '*/notest/*' \) -iregex '.*\.docx?' -printf '%h\0' | sort -zu)

some comments:

  • alternative -ipath with the construction \( -ipath '*/test/*' -o -ipath '*/notest/*' \)
  • the regex .*\.docx? - must match the whole filename and the x? mean zero or one x
  • tar can read the list of files from stdin with -T -
  • using null terminated filenames (helps if paths contain spaces)
  • the --null instructs tar to use such null terminated filenames
  • (cd ... &&) run in the subshell, so not need cd back


Related Topics



Leave a reply



Submit