Recursively listing the contents of a tar/zip archive
Here a perl
script that will list all files including recursion over zip
and tar
files:
#!/usr/bin/env perl
use strict;
use warnings;
use Archive::Extract;
use File::Temp;
my ($indent) = (0);
die qq|Usage: perl $0 <zip-file>\n| unless @ARGV == 1;
printf qq|%s\n|, $ARGV[0];
$indent += 2;
recursive_extract( shift );
exit 0;
sub recursive_extract {
my ($file) = @_;
my $tmpdir = File::Temp->newdir;
my $ae = Archive::Extract->new(
archive => $file,
);
$ae->extract( to => $tmpdir->dirname );
for my $f ( @{ $ae->files } ) {
printf qq|%s%s\n|, q| | x $indent, $f;
if ( $f =~ m/\.(?:zip|tar)\z/ ) {
$indent += 2;
recursive_extract( $f );
}
}
$indent -= 2;
}
Some drawbacks: It doesn't cache already processed files, so if there are identical compressed files, it will extract and read them again. And it will search for compressed files looking only in their extension, not their content. So it can be improved for anyone who need or want it.
Assuming following script is named script.pl
, give the zip
file as argument, running it like:
perl script.pl myzip.zip
And in my test it yields something like:
myzip.zip
f1
f2
f3
f4
mytar.tar
f5
f6
f7
f8
testtar.tar
f11
f12
f13
f14
testtar.tar
f11
f12
f13
f14
testzip.zip
fd
fd2
Listing the contents of zip files within a tar file
Starting with tar, you can get it to write to stdout with the -O command line option.
Next to the unzip part of the problem. The standard Linux infozip versions of zip/unzip don't support reading input from stdin. The funzip command, which is part of infozip can read from stdin, reads from stdin but it only supports uncompression. It doesn't have an option to list the contents.
Luckily Java jar files are really just zip files with a well-defined structure and the jar command can read from stdin and list the contents of the input file
So, assuming you have a tar file called my.tar that only contains zip files, something like this should list the names of the files in the embedded zip files
tar xOf my.tar | jar -t
Listing the content of a tar file or a directory only down to some level
tar tvf scripts.tar | awk -F/ '{if (NF<4) print }'
drwx------ glens/glens 0 2010-03-17 10:44 scripts/
-rwxr--r-- glens/www-data 1051 2009-07-27 10:42 scripts/my2cnf.pl
-rwxr--r-- glens/www-data 359 2009-08-14 00:01 scripts/pastebin.sh
-rwxr--r-- glens/www-data 566 2009-07-27 10:42 scripts/critic.pl
-rwxr-xr-x glens/glens 981 2009-12-16 09:39 scripts/wiki_sys.pl
-rwxr-xr-x glens/glens 3072 2009-07-28 10:25 scripts/blacklist_update.pl
-rwxr--r-- glens/www-data 18418 2009-07-27 10:42 scripts/sysinfo.pl
Make sure to note, that the number is 3+ however many levels you want, because of the / in the username/group. If you just do
tar tf scripts.tar | awk -F/ '{if (NF<3) print }'
scripts/
scripts/my2cnf.pl
scripts/pastebin.sh
scripts/critic.pl
scripts/wiki_sys.pl
scripts/blacklist_update.pl
scripts/sysinfo.pl
it's only two more.
You could probably pipe the output of ls -R
to this awk
script, and have the same effect.
How can I list the files in a zip archive without decompressing it?
Perreal's answer is right, but I recommend installing atool (look for it in your distribution's package manager). Then, for any kind of archive file, bzip2, gzip, tar... you have just one command to remember :
als archive_name
How to list the folders/files of a file.tar.gz file inside a file.tar
You may want to use and condition:
tar -xf abc.tar "abc.tar.gz" && tar -ztvf abc.tar.gz
Explanation:
For listing of files we use
If file is of type tar.gz:
tar -ztvf file.tar.gz
If file is of type tar:
tar -tvf file.tar
If file is of type tar.bz2:
tar -jtvf file.tar.bz2
You can also search for files in any of the above commands. e.g:
tar -tvf file.tar.bz2 '*.txt'
For extracting files we use
tar -xf file.tar
In these commands,
- t: List the contents of an archive.
- v: Verbosely list files processed (display detailed information).
- z: Filter the archive through gzip so that we can open compressed
(decompress) .gz tar file. - j: Filter archive through bzip2, use to decompress .bz2 files.
- f filename: Use archive file called filename.
- x: Extract all files from given tar, but when passed with a filename
will extract only matching files
Finding a file within recursive directory of zip files
You can omit using find for single-level (or recursive in bash 4 with globstar
) searches of .zip
files using a for
loop approach:
for i in *.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done
for recursive searching in bash 4:
shopt -s globstar
for i in **/*.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done
tar and recursive archiving
Try the next:
arch="archive.tar.gz"
while read -r -d $'\0' dir
do
(cd "$dir" && find . -maxdepth 1 -iregex '.*\.docx?' -print0 | tar --null -czf "$arch" -T - --remove-files)
#alternatively
#(cd "$dir" && shopt -s nocaseglob nullglob && tar --no-recursion -czf "$arch" *.doc *.docx --remove-files)
done < <(find . \( -ipath '*/test/*' -o -ipath '*/notest/*' \) -iregex '.*\.docx?' -printf '%h\0' | sort -zu)
some comments:
- alternative
-ipath
with the construction\( -ipath '*/test/*' -o -ipath '*/notest/*' \)
- the regex
.*\.docx?
- must match the whole filename and thex?
mean zero or onex
- tar can read the list of files from stdin with
-T -
- using null terminated filenames (helps if paths contain spaces)
- the
--null
instructs tar to use such null terminated filenames (cd ... &&)
run in the subshell, so not needcd
back
Related Topics
How Does Apparmor Handle Linux-Kernel Mount Namespaces
Double Free - Crash or No Crash
Eclipse-Mars on Linux: Black Background Color in Tooltip's
Cpu Usage from Linux Then Using It in a Arithmetic Expression
How to Use If/Else Awk to Evaluate a File and Extract This Information
What's The Relation Between 32/64-Bit Application, Os and Processor
Perl Signal Processing Only Works Once When Sighandler Calls Subroutine
Change The Escape Sequence Generated by Xterm for Key Combinations
How to Enable Vt-X/Amd-V in Aspire V5-122P Bios
Vim Pauses If Echo in .Vimrc File
Few Shell Commands Doesn't Work When I Invoke a Script via Qprocess in Qt
Docker Non-Root Bind-Mount Permissions, with -Userns-Remap
Perf: Strange Relation Between Software Events
Why Does Printf Still Work with Rax Lower Than The Number of Fp Args in Xmm Registers