Get files names inside a zip file on FTP server without downloading whole archive
You can implement a file-like object that reads data from FTP, instead of a local file. And pass that to ZipFile
constructor, instead of a (local) file name.
A trivial implementation can be like:
from ftplib import FTP
from ssl import SSLSocket
class FtpFile:
def __init__(self, ftp, name):
self.ftp = ftp
self.name = name
self.size = ftp.size(name)
self.pos = 0
def seek(self, offset, whence):
if whence == 0:
self.pos = offset
if whence == 1:
self.pos += offset
if whence == 2:
self.pos = self.size + offset
def tell(self):
return self.pos
def read(self, size = None):
if size == None:
size = self.size - self.pos
data = B""
# Based on FTP.retrbinary
# (but allows stopping after certain number of bytes read)
# An alternative implementation is at
# https://stackoverflow.com/q/58819210/850848#58819362
ftp.voidcmd('TYPE I')
cmd = "RETR {}".format(self.name)
conn = ftp.transfercmd(cmd, self.pos)
try:
while len(data) < size:
buf = conn.recv(min(size - len(data), 8192))
if not buf:
break
data += buf
# shutdown ssl layer (can be removed if not using TLS/SSL)
if SSLSocket is not None and isinstance(conn, SSLSocket):
conn.unwrap()
finally:
conn.close()
try:
ftp.voidresp()
except:
pass
self.pos += len(data)
return data
And then you can use it like:
ftp = FTP(host, user, passwd)
ftp.cwd(path)
ftpfile = FtpFile(ftp, "archive.zip")
zip = zipfile.ZipFile(ftpfile)
print(zip.namelist())
The above implementation is rather trivial and inefficient. It starts numerous (three at minimum) downloads of small chunks of data to retrieve a list of contained files. It can be optimized by reading and caching larger chunks. But it should give your the idea.
Particularly you can make use of the fact that you are going to read a listing only. The listing is located at the and of a ZIP archive. So you can just download last (about) 10 KB worth of data at the start. And you will be able to fulfill all read
calls out of that cache.
Knowing that, you can actually do a small hack. As the listing is at the end of the archive, you can actually download the end of the archive only. While the downloaded ZIP will be broken, it still can be listed. This way, you won't need the FtpFile
class. You can even download the listing to memory (StringIO
).
zipstring = StringIO()
name = "archive.zip"
size = ftp.size(name)
ftp.retrbinary("RETR " + name, zipstring.write, rest = size - 10*2024)
zip = zipfile.ZipFile(zipstring)
print(zip.namelist())
If you get BadZipfile
exception because the 10 KB is too small to contain whole listing, you can retry the code with a larger chunk.
Get ZIP first entry name from remote FTP Server without downloading the zip using Java 8+
I did the following , as far as i see there is not other answer .
Example imput ("ftp-folder/input.txt") :
public String getZipFirstEntryName(final String remotePath) {
this.log.info("ENTERING getZipFirstEntry, remotePath={} ", remotePath);
/* Setup FTP connection */
final FTPClient ftpClient = this.setupFtpConnection();
try {
ftpClient.changeWorkingDirectory(remotePath.split("/")[0]); /* ftp-folder */
} catch (final IOException e) {
e.printStackTrace();
}
try (final ZipArchiveInputStream zip = new ZipArchiveInputStream(ftpClient.retrieveFileStream(remotePath.split("/")[1]))) { /* input.txt */
this.log.info("EXITING getZipFirstEntry, remotePath={} ", remotePath);
return zip.getNextEntry().getName();
} catch (final IOException e) {
e.printStackTrace();
}
}
Reading file from a ZIP archive on FTP server without downloading to local system
The zipfile
module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv
also accepts a file-like object, all should work fine (provided you have enough available memory):
...
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
with ZipFile(flo) as archive:
with archive.open('foo/fee/bar.csv') as fd:
df = pd.read_csv(fd) # add relevant options here include encoding it is matters
How to unzip a file on remove FTP without downloading it?
In short, no (at least not without SSH)
Extrapolating from Bobby's excellent answer here:(https://superuser.com/questions/479661/how-to-unzip-files-via-an-ftp-connection)
"It is not possible to unzip files remotely. FTP stands for "File Transfer Protocol", which was designed to transfer and partly manage files on the remote end, but not to execute commands. To unpack an archive you'd have to execute a program like tar, bzip2 or similar, but that's not possible via a FTP connection.
You need another session which allows you to execute commands, like SSH. Or you unpack the archive on your machine and transfer the contents via FTP, which will be considerable slower if you have a large number of small files because of the overhead of FTP."
Hope this helps.
Archive files in ZIP on FTP server before ftp_get() the ZIP file using PHP
There's no API in the FTP protocol to ZIP files on a server.
So unless you have another access interface (like an SSH shell access), you cannot do this.
Related Topics
Nltk Named Entity Recognition to a Python List
Good or Bad Practice in Python: Import in the Middle of a File
How to Fix Character Constantly Accelerating in Both Directions After Deceleration Pygame
What Is the Purpose of Subclassing the Class "Object" in Python
Anaconda/Conda - Install a Specific Package Version
How to Clone a Python Generator Object
Using Monotonically_Increasing_Id() for Assigning Row Number to Pyspark Dataframe
How to Select Literal Values in an SQLalchemy Query
Class Variables Is Shared Across All Instances in Python
Mixing Cdef and Regular Python Attributes in Cdef Class
How to Convert an H:Mm:Ss Time String to Seconds in Python
Better Way to Shuffle Two Numpy Arrays in Unison
Is There a Multi-Dimensional Version of Arange/Linspace in Numpy
Selenium: Get Coordinates or Dimensions of Element with Python
Handling Urllib2's Timeout? - Python
How to Use a Multiprocessing Queue in a Function Called by Pool.Imap