Reading File Opened with Python Paramiko Sftpclient.Open Method Is Slow

Reading file opened with Python Paramiko SFTPClient.open method is slow

Calling SFTPFile.prefetch should increase the read speed:

ncfile = sftp_client.open('mynetCDFfile')
ncfile.prefetch()
b_ncfile = ncfile.read()

Another option is enabling read buffering, using bufsize parameter of SFTPClient.open:

ncfile = sftp_client.open('mynetCDFfile', bufsize=32768)
b_ncfile = ncfile.read()

(32768 is a value of SFTPFile.MAX_REQUEST_SIZE)

Similarly for writes/uploads:

Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow.


Yet another option is to explicitly specify the amount of data to read (it makes BufferedFile.read take a more efficient code path):

ncfile = sftp_client.open('mynetCDFfile')
b_ncfile = ncfile.read(ncfile.stat().st_size)

If none of that works, you can download the whole file to memory instead:

Use pdfplumber and Paramiko to read a PDF file from an SFTP server


Obligatory warning: Do not use AutoAddPolicy this way – You are losing a protection against MITM attacks by doing so. For a correct solution, see Paramiko "Unknown Server".

Open a remote file using paramiko in python slow

Your problem is likely to be caused by the file being a remote object. You've opened it on the server and are requesting one line at a time - because it's not local, each request takes much longer than if the file was sitting on your hard drive. The best alternative is probably to copy the file down to a local location first, using Paramiko's SFTP get.

Once you've done that, you can open the file from the local location using os.open.

Reading large Parquet file from SFTP with Pyspark is slow

By adding the buffer_size parameter in the pyarrow.parquet library, the computational time went from 51 to 21 minutes :)

df = pq.read_table(SERVER_LOCATION\FILE.parquet, filesystem = fs, buffer_size = 32768)

Thanks @Martin Prikryl for your help ;)

Use pdfplumber and Paramiko to read a PDF file from an SFTP server

Paramiko SFTPClient.open returns a file-like object.

To use a file-like object with pftplumber, it seems that you can use load function:

pdf = pdfplumber.load(fl)

You will also want to read this:

Reading file opened with Python Paramiko SFTPClient.open method is slow


As the Paramiko file-like object seems to work suboptimal when combined with pftplumber.load function, as a workaround, you can download the file to memory instead:

flo = BytesIO()
sftp.getfo(fullpath, flo)
flo.seek(0)
pdfplumber.load(flo)

See How to use Paramiko getfo to download file from SFTP server to memory to process it

Read a file from server with SSH using Python

Paramiko's SFTPClient class allows you to get a file-like object to read data from a remote file in a Pythonic way.

Assuming you have an open SSHClient:

sftp_client = ssh_client.open_sftp()
remote_file = sftp_client.open('remote_filename')
try:
for line in remote_file:
# process line
finally:
remote_file.close()

SFTP to S3 AWS Lambda using Python Paramiko is extremely slow

My solution to the problem to use paramiko readv(), which reads a list of chunks and saves time because it doesn't use seek. I also added multithreading with the method above to download several chunks at once, then use the multipart upload. Doing readv alone sped it up to 2-3MB a sec, with higher speeds hitting 10MB a sec, and the multiple threads provided the same speeds, but processed different parts of the file simultaneously. This allowed a 1GB file to be read in less than 6 minutes, whereas the original would've only allowed a 200MB in a 15 minute timeframe. I'll also add prefetch and the other fixes mentioned in the comments were not used, as readv uses prefetch on its own, and prefetch doesnt help with large files

Reading .csv file to memory from SFTP server using Python Paramiko

Assuming you are using Paramiko SFTP library, use SFTPClient.open method:

with sftp.open(path) as f:
f.prefetch()
df = pd.read_csv(f)

For the purpose of the prefetch, see Reading file opened with Python Paramiko SFTPClient.open method is slow.



Related Topics



Leave a reply



Submit