How to Extract Filename.Tar.Gz File

How to extract filename.tar.gz file

If file filename.tar.gz gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.

Unpack a tar without the z, it is for gzipped (compressed), only:

mv filename.tar.gz filename.tar # optional
tar xvf filename.tar

Or try a generic Unpacker like unp (https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.

determine the file type:

$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k

Extract from tar.gz by file name

I'll split the reply into two parts

Is it (programmatically) possible to extract a file by its filename

yes, it is possible to extract a file by its filename.

tar xzf tarfile.tar filename

without the overhead of decompressing other files?

In order to extract a file from a compressed tar file the tar program has to find the file you want. If that is the first file in the tarfile, then it only has to uncompress that. If the file isn't the first in the tarfile the tar program needs to scan through the tarfile until it finds the file you want. To do that is MUST uncompress the preceding files in the tarfile. That doesn't mean it has to extract them to disk or buffer these files in memory. It will stream the uncompression so that them memory overhead isn't significant.

How do I extract only the file of a .tar.gz member?

This code has worked for me:

import os
import shutil
import tarfile

with tarfile.open(fname, "r|*") as tar:
counter = 0

for member in tar:
if member.isfile():
filename = os.path.basename(member.name)
if filename != "myfile": # do your check
continue

with open("output.file", "wb") as output:
shutil.copyfileobj(tar.fileobj, output, member.size)

break # got our file

counter += 1
if counter % 1000 == 0:
tar.members = [] # free ram... yes we have to do this manually

But your problem might not be the extraction, but rather that your file is indeed no .tar.gz but just a .gz file.

Edit: Also your getting the error on the with line because python is trying to call the __enter__ function of the member object (wich does not exist).

How to extract a .tar.gz file on UNIX

You must either run the command from the directory your file exists in, or provide a relative or absolute path to the file. Let's do the latter:

cd /home/jsmith
mkdir cw
cd cw
tar zxvf /home/jsmith/Downloads/fileNameHere.tgz

How to extract a number of tar.gz files to a directory?


import glob, os, re, tarfile

# Setup main paths.
tarfile_rootdir = r'D:\SPRING2019\Tarfiles'
extract_rootdir = r'D:\SPRING2019\Test'

# Process the files.
re_pattern = re.compile(r'\A(\w+)-\d+[a-zA-Z]0{0,5}(\d+)')

for tar_file in glob.iglob(os.path.join(tarfile_rootdir, '*.tgz')):

# Get the parts from the base tgz filename using regular expressions.
part = re.findall(re_pattern, os.path.basename(tar_file))[0]

# Build the extraction path from each part.
extract_path = os.path.join(extract_rootdir, *part)

# Perform the extract of all files from the zipfile.
with tarfile.open(tar_file, 'r:gz') as r:
r.extractall(extract_path)

This code is based similar to the
answer
to your last question. Due to uncertain information on
directory structure, I will provide a structure as an
example.

TGZ files in D:\SPRING2019\Tarfiles:

DZB1216-500058L002001.tgz
DZB1216-500058L003001.tgz

Extract directory structure in D:\SPRING2019\Test:

DZB1216
2001
3001

The .tgz file paths are retrieved with glob.

From example filename: DZB1216-500058L002001.tgz,
the regular expression will capture 2 groups:

  • \A is an anchor at the start of the string.

    This is not a group.
  • (\w+) to match DZB1216.

    This is the 1st group.
  • -\d+[a-zA-Z]0{0,5} matches up to the next group.

    This is not a group.
  • (\d+) to match 2001.

    This is the 2nd group.

The extraction path is joined using the values of
extract_rootdir, DZB1216, and 2001.
This results in D:\SPRING2019\Test\DZB1216\2001
as the extraction path.

The use of tarfile
will extract all from the .tgz file.

untar filename.tr.gz to directory filename

With Bash and GNU tar:

file=tar123.tar.gz
dir=/myunzip/${file%.tar.gz}
mkdir -p $dir
tar -C $dir -xzf $file

Extract files contained in archive.tar.gz to new directory named archive

Update since GNU tar 1.28:
use --one-top-level, see https://www.gnu.org/software/tar/manual/tar.html#index-one_002dtop_002dlevel_002c-summary

Older versions need to script this. You can specify the directory that the extract is placed in by using the tar -C option.

The script below assumes that the directories do not exist and must be created. If the directories do exist the script will still work - the mkdir will simply fail.

tar -xvzf archive.tar.gx -C archive_dir

e.g.

for a in *.tar.gz
do
a_dir=${a%.tar.gz}
mkdir --parents $a_dir
tar -xvzf $a -C $a_dir
done

Extract filename and extension in Bash

First, get file name without the path:

filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"

Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:

filename="${fullfile##*/}"

You may want to check the documentation :

  • On the web at section "3.5.3 Shell Parameter Expansion"
  • In the bash manpage at section called "Parameter Expansion"


Related Topics



Leave a reply



Submit