Get Files Names Inside a Zip File on Ftp Server Without Downloading Whole Archive

Get files names inside a zip file on FTP server without downloading whole archive

You can implement a file-like object that reads data from FTP, instead of a local file. And pass that to ZipFile constructor, instead of a (local) file name.

A trivial implementation can be like:

from ftplib import FTP
from ssl import SSLSocket

class FtpFile:

def __init__(self, ftp, name):
self.ftp = ftp
self.name = name
self.size = ftp.size(name)
self.pos = 0

def seek(self, offset, whence):
if whence == 0:
self.pos = offset
if whence == 1:
self.pos += offset
if whence == 2:
self.pos = self.size + offset

def tell(self):
return self.pos

def read(self, size = None):
if size == None:
size = self.size - self.pos
data = B""

# Based on FTP.retrbinary
# (but allows stopping after certain number of bytes read)
# An alternative implementation is at
# https://stackoverflow.com/q/58819210/850848#58819362
ftp.voidcmd('TYPE I')
cmd = "RETR {}".format(self.name)
conn = ftp.transfercmd(cmd, self.pos)
try:
while len(data) < size:
buf = conn.recv(min(size - len(data), 8192))
if not buf:
break
data += buf
# shutdown ssl layer (can be removed if not using TLS/SSL)
if SSLSocket is not None and isinstance(conn, SSLSocket):
conn.unwrap()
finally:
conn.close()
try:
ftp.voidresp()
except:
pass
self.pos += len(data)
return data

And then you can use it like:

ftp = FTP(host, user, passwd)
ftp.cwd(path)

ftpfile = FtpFile(ftp, "archive.zip")
zip = zipfile.ZipFile(ftpfile)
print(zip.namelist())

The above implementation is rather trivial and inefficient. It starts numerous (three at minimum) downloads of small chunks of data to retrieve a list of contained files. It can be optimized by reading and caching larger chunks. But it should give your the idea.


Particularly you can make use of the fact that you are going to read a listing only. The listing is located at the and of a ZIP archive. So you can just download last (about) 10 KB worth of data at the start. And you will be able to fulfill all read calls out of that cache.


Knowing that, you can actually do a small hack. As the listing is at the end of the archive, you can actually download the end of the archive only. While the downloaded ZIP will be broken, it still can be listed. This way, you won't need the FtpFile class. You can even download the listing to memory (StringIO).

zipstring = StringIO()
name = "archive.zip"
size = ftp.size(name)
ftp.retrbinary("RETR " + name, zipstring.write, rest = size - 10*2024)

zip = zipfile.ZipFile(zipstring)

print(zip.namelist())

If you get BadZipfile exception because the 10 KB is too small to contain whole listing, you can retry the code with a larger chunk.

Get ZIP first entry name from remote FTP Server without downloading the zip using Java 8+

I did the following , as far as i see there is not other answer .


Example imput ("ftp-folder/input.txt") :

public String getZipFirstEntryName(final String remotePath) {                                                                                
this.log.info("ENTERING getZipFirstEntry, remotePath={} ", remotePath);

/* Setup FTP connection */
final FTPClient ftpClient = this.setupFtpConnection();

try {
ftpClient.changeWorkingDirectory(remotePath.split("/")[0]); /* ftp-folder */
} catch (final IOException e) {
e.printStackTrace();
}

try (final ZipArchiveInputStream zip = new ZipArchiveInputStream(ftpClient.retrieveFileStream(remotePath.split("/")[1]))) { /* input.txt */

this.log.info("EXITING getZipFirstEntry, remotePath={} ", remotePath);
return zip.getNextEntry().getName();

} catch (final IOException e) {
e.printStackTrace();
}

}

Reading file from a ZIP archive on FTP server without downloading to local system

The zipfile module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv also accepts a file-like object, all should work fine (provided you have enough available memory):

...
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
with ZipFile(flo) as archive:
with archive.open('foo/fee/bar.csv') as fd:
df = pd.read_csv(fd) # add relevant options here include encoding it is matters

How to unzip a file on remove FTP without downloading it?

In short, no (at least not without SSH)

Extrapolating from Bobby's excellent answer here:(https://superuser.com/questions/479661/how-to-unzip-files-via-an-ftp-connection)
"It is not possible to unzip files remotely. FTP stands for "File Transfer Protocol", which was designed to transfer and partly manage files on the remote end, but not to execute commands. To unpack an archive you'd have to execute a program like tar, bzip2 or similar, but that's not possible via a FTP connection.

You need another session which allows you to execute commands, like SSH. Or you unpack the archive on your machine and transfer the contents via FTP, which will be considerable slower if you have a large number of small files because of the overhead of FTP."

Hope this helps.

Archive files in ZIP on FTP server before ftp_get() the ZIP file using PHP

There's no API in the FTP protocol to ZIP files on a server.


So unless you have another access interface (like an SSH shell access), you cannot do this.



Related Topics



Leave a reply



Submit