Split tar.bz2 file and extract each individually
I don't think it's easily possible. A .tar.bz2
is a single stream, it doesn't have an index like zip
that would allow skipping to the start of a particular file within the archive. You can split the file using split
utility, and than cat
the parts and extract them (you can do this via stdin to avoid re-creating the pasted file on disk). The first fragment will be possible to extract separately (except for the last file in it which will probably be damaged), but further fragments will not be usable without the onces that come before them.
Is it possible to split a huge text file (based on number of lines) unpacking a .tar.gz archive if I cannot extract that file as whole?
To extract a file from f.tar.gz
and split it into files, each with no more than 1 million lines, use:
tar Oxzf f.tar.gz | split -l1000000
The above will name the output files by the default method. If you prefer the output files to be named prefix.nn where nn is a sequence number, then use:
tar Oxzf f.tar.gz |split -dl1000000 - prefix.
Under this approach:
The original file is never written to disk.
tar
reads from the.tar.gz
file and pipes its contents tosplit
which divides it up into pieces before writing the pieces to disk.The
.tar.gz
file is read only once.split
, through its many options, has a great deal of flexibility.
Explanation
For the tar
command:
O
tellstar
to send the output to stdout. This way we can pipe it tosplit
without ever having to save the original file on disk.x
tellstar
to extract the file (as opposed to, say, creating an archive).z
tellstar
that the archive is in gzip format. On modern tars, this is optionalf
tellstar
to use, as input, the file name specified.
For the split
command:
-l
tellssplit
to split files limited by number of lines (as opposed to, say, bytes).-d
tellssplit
to use numeric suffixes for the output files.-
tellssplit
to get its input from stdin
compress multiple files into a bz2 file in python
This is what tarballs are for. The tar
format packs the files together, then you compress the result. Python makes it easy to do both at once with the tarfile
module, where passing a "mode" of 'w:bz2'
opens a new tar file for write with seamless bz2
compression. Super-simple example:
import tarfile
with tarfile.open('mytar.tar.bz2', 'w:bz2') as tar:
for file in mylistoffiles:
tar.add(file)
If you don't need much control over the operation, shutil.make_archive
might be a possible alternative, which would simplify the code for compressing a whole directory tree to:
shutil.make_archive('mytar', 'bztar', directory_to_compress)
Related Topics
Unix Command to Convert Xls File into Xlsx File
How to Show Printk() Message in Console
How to Disable Qt's Behavior on Linux of Capturing Arrow Keys for Widget Focus Navigation
Cannot Compile Mergevec.Cpp from Haartraining Tutorial
How to Flush Raw Af_Packet Socket to Get Correct Filtered Packets
Find Is Returning "Find: .: Permission Denied", But I Am Not Searching In
Is There Any Difference Between '=' and '==' Operators in Bash or Sh
Linux - Run Android Emulator on Nouveau Driver
Warning: The Use of 'Tmpnam' Is Dangerous, Better Use 'Mkstemp'
Gnuplot-Like Program for Timeline Data
Toolchain to Crosscompile Applications for Bbb
Difference Between "Cpu/Mem-Loads/Pp" and "Cpu/Mem-Loads/"
Autostart Javafx Application on Raspberrypi
Make for Compiling - All *.C Files in Folders & Subfolders in Project