Can I split compressed sql file using split linux command? If not, then any other method to do?
You might take a look at this question: split-files-using-tar-gz-zip-or-bzip2
I assume the reason you want to split it is to move it? And that you know you probably wont be able to import a small slice of the file into a database?
compress multiple files into a bz2 file in python
This is what tarballs are for. The tar
format packs the files together, then you compress the result. Python makes it easy to do both at once with the tarfile
module, where passing a "mode" of 'w:bz2'
opens a new tar file for write with seamless bz2
compression. Super-simple example:
import tarfile
with tarfile.open('mytar.tar.bz2', 'w:bz2') as tar:
for file in mylistoffiles:
tar.add(file)
If you don't need much control over the operation, shutil.make_archive
might be a possible alternative, which would simplify the code for compressing a whole directory tree to:
shutil.make_archive('mytar', 'bztar', directory_to_compress)
How to decompress BZIP (not BZIP2) with Apache Commons
The original Bzip was supposedly using a patented algorithm so Bzip2 was born using algorithms and techniques that were not patented.
That might be the reason why it's no longer in widespread use and open source libraries ignore it.
There's some C code for decompressing Bzip files shown here (gist.github.com mirror).
You might want to read and rewrite that in Java.
How to protect myself from a gzip or bzip2 bomb?
I guess the answer is: There is no easy, readymade solution. Here is what I use now:
class SafeUncompressor(object):
"""Small proxy class that enables external file object
support for uncompressed, bzip2 and gzip files. Works transparently, and
supports a maximum size to avoid zipbombs.
"""
blocksize = 16 * 1024
class FileTooLarge(Exception):
pass
def __init__(self, fileobj, maxsize=10*1024*1024):
self.fileobj = fileobj
self.name = getattr(self.fileobj, "name", None)
self.maxsize = maxsize
self.init()
def init(self):
import bz2
import gzip
self.pos = 0
self.fileobj.seek(0)
self.buf = ""
self.format = "plain"
magic = self.fileobj.read(2)
if magic == '\037\213':
self.format = "gzip"
self.gzipobj = gzip.GzipFile(fileobj = self.fileobj, mode = 'r')
elif magic == 'BZ':
raise IOError, "bzip2 support in SafeUncompressor disabled, as self.bz2obj.decompress is not safe"
self.format = "bz2"
self.bz2obj = bz2.BZ2Decompressor()
self.fileobj.seek(0)
def read(self, size):
b = [self.buf]
x = len(self.buf)
while x < size:
if self.format == 'gzip':
data = self.gzipobj.read(self.blocksize)
if not data:
break
elif self.format == 'bz2':
raw = self.fileobj.read(self.blocksize)
if not raw:
break
# this can already bomb here, to some extend.
# so disable bzip support until resolved.
# Also monitor http://stackoverflow.com/questions/13622706/how-to-protect-myself-from-a-gzip-or-bzip2-bomb for ideas
data = self.bz2obj.decompress(raw)
else:
data = self.fileobj.read(self.blocksize)
if not data:
break
b.append(data)
x += len(data)
if self.pos + x > self.maxsize:
self.buf = ""
self.pos = 0
raise SafeUncompressor.FileTooLarge, "Compressed file too large"
self.buf = "".join(b)
buf = self.buf[:size]
self.buf = self.buf[size:]
self.pos += len(buf)
return buf
def seek(self, pos, whence=0):
if whence != 0:
raise IOError, "SafeUncompressor only supports whence=0"
if pos < self.pos:
self.init()
self.read(pos - self.pos)
def tell(self):
return self.pos
It does not work well for bzip2, so that part of the code is disabled. The reason is that bz2.BZ2Decompressor.decompress
can already produce an unwanted large chunk of data.
Related Topics
Signal Handling in Asm: Why am I Receiving Sigsegv When Invoking the Sys_Pause Syscall
How to Imshow with Invisible Figure in Matlab Running on Linux
How the Share Library Be Shared by Different Processes
Why Can Back-Quotes and $() for Command Substitution Result in Different Output
How to Non-Interactively Turn on Features in a Linux Kernel .Config File
Multiplication with Expr in Shell Script
How to Run Script Commands from Variables
Sed Replacement Not Working When Using Variables
Specifying Non-Standard Baud Rate for Ftdi Virtual Serial Port Under Linux
What's the Difference Between ./Script.Sh and Bash Script.Sh
C Program Shows %Zu After Conversion to Windows
Read a File and Split Each Line into Multiple Variables
How to Set Process Group of a Shell Script
What Does 'Set -O Errtrace' Do in a Shell Script
Less Gets Keyboard Input from Stderr
Bash 'Swallowing' Sub-Shell Children Process When Executing a Single Command