How to extract filename.tar.gz file
If file filename.tar.gz
gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z
, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp
(https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
Extract from tar.gz by file name
I'll split the reply into two parts
Is it (programmatically) possible to extract a file by its filename
yes, it is possible to extract a file by its filename.
tar xzf tarfile.tar filename
without the overhead of decompressing other files?
In order to extract a file from a compressed tar file the tar
program has to find the file you want. If that is the first file in the tarfile, then it only has to uncompress that. If the file isn't the first in the tarfile the tar
program needs to scan through the tarfile until it finds the file you want. To do that is MUST uncompress the preceding files in the tarfile. That doesn't mean it has to extract them to disk or buffer these files in memory. It will stream the uncompression so that them memory overhead isn't significant.
How do I extract only the file of a .tar.gz member?
This code has worked for me:
import os
import shutil
import tarfile
with tarfile.open(fname, "r|*") as tar:
counter = 0
for member in tar:
if member.isfile():
filename = os.path.basename(member.name)
if filename != "myfile": # do your check
continue
with open("output.file", "wb") as output:
shutil.copyfileobj(tar.fileobj, output, member.size)
break # got our file
counter += 1
if counter % 1000 == 0:
tar.members = [] # free ram... yes we have to do this manually
But your problem might not be the extraction, but rather that your file is indeed no .tar.gz but just a .gz file.
Edit: Also your getting the error on the with line because python is trying to call the __enter__
function of the member object (wich does not exist).
How to extract a .tar.gz file on UNIX
You must either run the command from the directory your file exists in, or provide a relative or absolute path to the file. Let's do the latter:
cd /home/jsmith
mkdir cw
cd cw
tar zxvf /home/jsmith/Downloads/fileNameHere.tgz
How to extract a number of tar.gz files to a directory?
import glob, os, re, tarfile
# Setup main paths.
tarfile_rootdir = r'D:\SPRING2019\Tarfiles'
extract_rootdir = r'D:\SPRING2019\Test'
# Process the files.
re_pattern = re.compile(r'\A(\w+)-\d+[a-zA-Z]0{0,5}(\d+)')
for tar_file in glob.iglob(os.path.join(tarfile_rootdir, '*.tgz')):
# Get the parts from the base tgz filename using regular expressions.
part = re.findall(re_pattern, os.path.basename(tar_file))[0]
# Build the extraction path from each part.
extract_path = os.path.join(extract_rootdir, *part)
# Perform the extract of all files from the zipfile.
with tarfile.open(tar_file, 'r:gz') as r:
r.extractall(extract_path)
This code is based similar to the
answer
to your last question. Due to uncertain information on
directory structure, I will provide a structure as an
example.
TGZ files in D:\SPRING2019\Tarfiles
:
DZB1216-500058L002001.tgz
DZB1216-500058L003001.tgz
Extract directory structure in D:\SPRING2019\Test
:
DZB1216
2001
3001
The .tgz
file paths are retrieved with glob
.
From example filename: DZB1216-500058L002001.tgz
,
the regular expression will capture 2 groups:
\A
is an anchor at the start of the string.
This is not a group.(\w+)
to matchDZB1216
.
This is the 1st group.-\d+[a-zA-Z]0{0,5}
matches up to the next group.
This is not a group.(\d+)
to match2001
.
This is the 2nd group.
The extraction path is joined using the values ofextract_rootdir
, DZB1216
, and 2001
.
This results in D:\SPRING2019\Test\DZB1216\2001
as the extraction path.
The use of tarfile
will extract all from the .tgz
file.
untar filename.tr.gz to directory filename
With Bash and GNU tar:
file=tar123.tar.gz
dir=/myunzip/${file%.tar.gz}
mkdir -p $dir
tar -C $dir -xzf $file
Extract files contained in archive.tar.gz to new directory named archive
Update since GNU tar 1.28:
use --one-top-level
, see https://www.gnu.org/software/tar/manual/tar.html#index-one_002dtop_002dlevel_002c-summary
Older versions need to script this. You can specify the directory that the extract is placed in by using the tar -C option.
The script below assumes that the directories do not exist and must be created. If the directories do exist the script will still work - the mkdir will simply fail.
tar -xvzf archive.tar.gx -C archive_dir
e.g.
for a in *.tar.gz
do
a_dir=${a%.tar.gz}
mkdir --parents $a_dir
tar -xvzf $a -C $a_dir
done
Extract filename and extension in Bash
First, get file name without the path:
filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:
filename="${fullfile##*/}"
You may want to check the documentation :
- On the web at section "3.5.3 Shell Parameter Expansion"
- In the bash manpage at section called "Parameter Expansion"
Related Topics
How to Send a Sequence of at Commands to a Serial Port in Bash
How to Remove All Lines from a Text File Starting at First Empty Line
How to Detect Usb Drive Insertion in Linux
Linux:How to Set Default Route from C
Split Fasta Files Based on Header
How to Ssh Multiple Hops Without Putting the Local Rsa Key Everywhere
Rename Files and Directories (Add Prefix)
Turning Multiple Lines into One Comma Separated Line
Show Special Characters in Unix While Using 'Less' Command
Examining Berkeley Db Files from the Cli
What Is Start-Stop-Daemon in Linux Scripting
Equivalent of Ctrl C in Command to Cancel a Program
How to Automate HTML-To-Pdf Conversions
How to Force Linking with a Static Library When a Shared Library of Same Name Is Present