reading tar file contents without untarring it, in python script
you can use getmembers()
>>> import tarfile
>>> tar = tarfile.open("test.tar")
>>> tar.getmembers()
After that, you can use extractfile()
to extract the members as file object. Just an exampleimport tarfile,os
import sys
os.chdir("/tmp/foo")
tar = tarfile.open("test.tar")
for member in tar.getmembers():
f=tar.extractfile(member)
content=f.read()
print "%s has %d newlines" %(member, content.count("\n"))
print "%s has %d spaces" % (member,content.count(" "))
print "%s has %d characters" % (member, len(content))
sys.exit()
tar.close()
With the file object f
in the above example, you can use read()
, readlines()
etc. Read *.tar.gz file in python without extracting
When you call tarfile.open
,
tarfile.open('arhivename.tar.gz', encoding='utf-8')
The encoding
parameter controls the encoding of the filenames, not the encoding of the file contents. It doesn't make sense for the encoding
parameter to control the encoding of the file contents, because different files inside the tar file can be encoded differently. So, a tar file really just contains binary data.You can decode this data by wrapping the file with the UTF-8 stream reader from the codecs
module:
import codecs
utf8reader = codecs.getreader('utf-8')
for name in tar.getmembers():
fp = utf8reader(tar.extractfile(name))
Read .gz files inside .tar files without extracting
You need to use tar.extractfile(member)
instead of tarfile.extractfile(member)
. tarfile
is the class, and doesn't know about the tar file you opened. tar
is the tarfile object, which references the .tar file you opened.
To do it right, use next()
instead of getmembers()
or getnames()
, so that you don't have to read the entire tar file twice:
with tarfile.open(sys.argv[1]) as tar:
while ent := tar.next():
if ent.name.endswith(".gz"):
print(gzip.GzipFile(fileobj=tar.extractfile(ent)).read())
How do I list contents of a tar file without extracting it in python?
You can use TarFile.getnames() like this:
#!/usr/bin/env python3
import tarfile
tarf = tarfile.open('foo.tar.gz', 'r:gz')
print(tarf.getnames())
http://docs.python.org/3.3/library/tarfile.html#tarfile.TarFile.getnamesAnd if you want mtime values you can use getmembers().
print([(member.name, member.mtime) for member in tarf.getmembers()])
Python read file within tar archive
Try this:
import tarfile
tar = tarfile.open("docs.tar.gz")
f = tar.extractfile("docs.json")
# do something like f.read()
# since your file is json, you'll probably want to do this:
import json
json.loads(f.read())
Read .tar.gz file in Python
The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.
One possible solution is to skip over the None results:
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f = tar.extractfile(member)
if f is not None:
content = f.read()
Reading file from concatinated ( tar ) file directly without untarring the tar file
The tarfile
module gives you access to tarballs. It won't be random access, but you can read out any files you need and put them in a temporary directory, or just store them in strings.
Extracting compressed gz file from tar archive in python
You can use gzip.decompress:
import tarfile, os, gzip
import sys
tar = tarfile.open("arXiv_src_9107_001a.tar")
n = 0
for member in tar.getmembers():
#Skip directory labeled at the top
if(n==0):
n=1
continue
f=tar.extractfile(member)
print(member)
content=f.read()
expanded = gzip.decompress(content)
# do whatever with expanded here
tar.close()
Related Topics
Python Pip on Windows - Command 'Cl.Exe' Failed
What Is the Inverse Function of Zip in Python
Loading Initial Data with Django 1.7 and Data Migrations
In Python, What Happens When You Import Inside of a Function
Is 'Import Module' Better Coding Style Than 'From Module Import Function'
Joining Pairs of Elements of a List
In Python, Is It Better to Use List Comprehensions or For-Each Loops
How to Get Precision, Recall and F-Measure from Confusion Matrix in Python
Difference Between Exit(0) and Exit(1) in Python
Best Way to Make Django's Login_Required the Default
How to Know/Change Current Directory in Python Shell
Pandas - Add New Column to Dataframe from Dictionary
How to Change Data Points Color Based on Some Variable
How to Save and Restore Multiple Variables in Python
Comparable Classes in Python 3