How to determine a tar archive's format
You can use file
under Linux to look at the fingerprint of the uncompressed archive:
$ touch foo # create test file
$ tar --format=posix -cf posix.tar foo # create test posix archive
$ tar --format=gnu -cf gnu.tar foo # create test gnu archive
$ file posix.tar gnu.tar
posix.tar: POSIX tar archive
gnu.tar: POSIX tar archive (GNU)
If the archive is compressed, decompress it first, because file
won't peer beyond the compression layer:
$ touch foo # create test file
$ tar --format=posix -czf posix.tar.gz foo # create test gzip posix archive
$ tar --format=gnu -czf gnu.tar.gz foo # create test gzip gnu archive
$ file posix.tar.gz gnu.tar.gz # show output when compressed
posix.tar.gz: gzip compressed data
gnu.tar.gz: gzip compressed data
$ gunzip posix.tar.gz # decompress to posix.tar
$ gunzip gnu.tar.gz # decompress to gnu.tar
$ file posix.tar gnu.tar # show output after decompression
posix.tar: POSIX tar archive
gnu.tar: POSIX tar archive (GNU)
Or, check the compressed archives without saving the decompressed file by piping the output directly to file
's standard input:
$ gunzip --stdout posix.tar.gz | file -
/dev/stdin: POSIX tar archive
$ gunzip --stdout gnu.tar.gz | file -
/dev/stdin: POSIX tar archive (GNU)
GNU is based on an older POSIX format, so that is why it says it is both.
For the nitty gritty details, the format is described in the GNU tar manual here and more details here.
How to check whether a file is in tar format?
Check the magic bytes at offset 257. If they match "ustar" including the null terminator, the file is probably a tar.
See: http://www.gnu.org/software/tar/manual/html_node/Standard.html
/* tar Header Block, from POSIX 1003.1-1990. */
/* POSIX header. */
struct posix_header
{ /* byte offset */
char name[100]; /* 0 */
char mode[8]; /* 100 */
char uid[8]; /* 108 */
char gid[8]; /* 116 */
char size[12]; /* 124 */
char mtime[12]; /* 136 */
char chksum[8]; /* 148 */
char typeflag; /* 156 */
char linkname[100]; /* 157 */
char magic[6]; /* 257 */
char version[2]; /* 263 */
char uname[32]; /* 265 */
char gname[32]; /* 297 */
char devmajor[8]; /* 329 */
char devminor[8]; /* 337 */
char prefix[155]; /* 345 */
/* 500 */
};
#define TMAGIC "ustar" /* ustar and a null */
#define TMAGLEN 6
TAR file format issue
In my opinion none of your examples is the correct one, at least not for the POSIX format.
As you can read here:
/* tar Header Block, from POSIX 1003.1-1990. */
/* POSIX header */
struct posix_header { /* byte offset */
char name[100]; /* 0 */
char mode[8]; /* 100 */
char uid[8]; /* 108 */
char gid[8]; /* 116 */
char size[12]; /* 124 */
char mtime[12]; /* 136 */
char chksum[8]; /* 148 */
char typeflag; /* 156 */
char linkname[100]; /* 157 */
char magic[6]; /* 257 */
char version[2]; /* 263 */
char uname[32]; /* 265 */
char gname[32]; /* 297 */
char devmajor[8]; /* 329 */
char devminor[8]; /* 337 */
char prefix[155]; /* 345 */
};
#define TMAGIC "ustar" /* ustar and a null */
#define TMAGLEN 6
#define TVERSION "00" /* 00 and no null */
#define TVERSLEN 2
The format of your first example (Scenario 1
) seems to be matching with the old GNU header format:
/* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
Found in an archive, it indicates an old GNU header format, which will be
hopefully become obsolescent. With OLDGNU_MAGIC, uname and gname are
valid, though the header is not truly POSIX conforming */
#define OLDGNU_MAGIC "ustar " /* 7 chars and a null */
In both your second and third examples (Scenario 2
and Scenario 3
), the version
field is set to an unexpected value (according to the above documentation, the correct value should be 00
ASCII or 0x30 0x30
hex), so this field is most likely ignored.
How to check if a Unix .tar.gz file is a valid file without uncompressing?
What about just getting a listing of the tarball and throw away the output, rather than decompressing the file?
tar -tzf my_tar.tar.gz >/dev/null
Edited as per comment. Thanks zrajm!
Edit as per comment. Thanks Frozen Flame! This test in no way implies integrity of the data. Because it was designed as a tape archival utility most implementations of tar will allow multiple copies of the same file!
How to check if file is tar file in Bash shell?
file command can determine file type:
file my.tar
if it is a tar file it will output:
my.tar: POSIX tar archive (GNU)
Then you can use grep to check the output (whether or not contains tar archive):
file my.tar | grep -q 'tar archive; && echo "I'm tar" || echo "I'm not tar"
In case the file does not exis, file output will be (with exit code 0):
do-not-exist.txt: cannot open `do-not-exist.txt' (No such file or directory).
You could use a case statement to handle several types of files.
How to determine if data is valid tar file without a file?
Say your uploaded data is contained in string data
.
from tarfile import TarFile, TarError
from StringIO import StringIO
sio = StringIO(data)
try:
tf = TarFile(fileobj=sio)
# process the file....
except TarError:
print "Not a tar file"
There are additional complexities such as handling different tar file formats and compression. More info is available in the tarfile documentation.
Why does GNU tar --format=pax produce ustar archives?
pax Interchange Format:
A pax archive tape or file produced in the -x pax format shall contain
a series of blocks. The physical layout of the archive shall be
identical to the ustar format described in ustar Interchange Format.
"ustar" followed by 1 zero/NUL byte is the value of the magic
field indicating the type of the archive:
The
magic
field is the specification that this archive was output in
this archive format. If this field contains ustar (the five
characters from the ISO/IEC 646:1991 standard IRV shown followed by
NUL), …
Of course, that's only for any conforming pax
utility, but I'd expect pax
format archives created by GNU tar
to create archives in the same way as a conforming pax
implementation.
Read .tar entries in a specific order (C#, SharpLibZip)
You would need to write your own tar decoder. It is up to you to say if you would consider this to be "easy" or not. The tar format is pretty simple.
You would need to first scan through the tar file to find all the headers, saving the file name and the offset and length of the file data for each. Then you could seek back and forth to the offset of any file to read its contents.
This would be much more difficult if the tar file were compressed, e.g. if it were a .tar.gz
file, as opposed to a .tar
file.
The tar format is documented here.
Update:
In a comment, the OP revealed that it is actually a .tar.bz2
file. As noted, that requires additional work to be able to randomly access entries. In addition to building an index to the tar contents, the entire .bz2
file needs to be read to build an index to the compression entry points, which do not correspond to where files start in the tar archive. Then to access a file you first would go to the closest bzip2 entry point that precedes the start of that file data, and decompress from there until you arrive at and then read out that data.
It would be easier to simply rearchive and recompress the files into the zip format, which is designed to randomly access and extract individual entries.
tar: Unrecognized archive format error when trying to unpack flower_photos.tgz, TF tutorials on OSX
Apparently the new instructions on TensorFlow website run without issues
I just tried the instructions posted on How to Retrain Inception's Final Layer for New Categories
curl -O http://download.tensorflow.org/example_images/flower_photos.tgz
tar xzf flower_photos.tgz
It worked without any problems
How to extract filename.tar.gz file
If file filename.tar.gz
gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z
, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp
(https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
Related Topics
Find Ip Address of My System for a Particular Interface with Shell Script (Bash)
How to Store Multiple Pats/Passwords for Use by Git
Get Man Pages for Driver Functions
Unshare User Namespace and Set UId Mapping with Newuidmap
Function Return Values Within Bash If Statements
How Make /Var/Www Contents Editable by Ide
How My Custom Module on Linux 3.2.28 Can Make a Call to Print_Cpu_Info
Which Is Faster of Two Case or If
Editing The Sudo File in a Shell Script
Gradle 1.3: Build.Gradle Not Building Classes
Jmeter:Difference Between Jmeter.Sh and Jmeter Without Extension File in Jmeter
Grunt Karma Testing on Vagrant When Host Changes Sources Grunt/Karma Doesn't Detect It
Udp Server Giving Segmentation Fault
How to Test Your Own Linux Module
How to Check If a UId Exists in an Acl in Linux
How to Delete X Number of Files in a Directory
How Does Linux Handles The I/O Permission Bitmap in The Tss Structure