Python Ftp Get the Most Recent File by Date

Python FTP get the most recent file by date

With NLST, like shown in Martin Prikryl's response,
you should use sorted method:

ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]

Downloading the most recent file from FTP with Python

Once you have the list of filenames you can simply sort on filename, since the naming convention is S01375T-YYYY-MM-DD-hh-mm.csv this will naturally sort into date/time order. Note that if the S01375T- part varies you could sort on the name split at a fixed position or at the first -.

If this was not the case you could use the datetime.datetime.strptime method to parse the filenames into datetime instances.

Of course if you wished to really simplify things you could use the PyFileSystem FTPFS and it's various methods to allow you to treat the FTP system as if is was a slow local file system.

Python FTP server download Latest File with specific keywords in filename

resolved.

    import ftplib
import os
import time
from dateutil import parser

ftp = ftplib.FTP('test.rebex.net', 'demo','password')
ftp.retrlines('LIST')

ftp.cwd("pub")
ftp.cwd("example")
ftp.retrlines('LIST')

names = ftp.nlst()
final_names= [line for line in names if 'client' in line]

latest_time = None
latest_name = None

for name in final_names:
time = ftp.sendcmd("MDTM " + name)
if (latest_time is None) or (time > latest_time):
latest_name = name
latest_time = time

print(latest_name)
file = open(latest_name, 'wb')
ftp.retrbinary('RETR '+ latest_name, file.write)

python how to read latest file in ftp directory

I don't think this question has anything to do with python specifically: you just need to fetch the file the same way you would fetch it with any other FTP client:

for name in names:
time = ftp.sendcmd("MDTM " + name)
if (latest_time is None) or (time > latest_time):
latest_name = name
latest_time = time

with open("myfile.xlsx", "wb") as f:
ftp.retrbinary(f"RETR {latest_name}", f.write)

As to reading the resulting file in to a pandas DF, that's a separate question, but now that you have the file you can do it as you normally would.

Python get recent files FTP

Looking at the documentation for the Python ftplib, it looks like the output from retrlines() will be a line where the file name is the last "column".

-rw-r--r--   1 ftp-usr  pdmaint     5305 Mar 20 09:48 INDEX

So a simple split and getting the last field should work. It will however only work if there are no white-space characters in the file/folder name.

name = line.split()[-1]
print(name) # Should be "INDEX"

You might want to employ a more sophisticated parsing if you want to handle names with white-spaces in them.

Get the latest FTP folder name in Python

If your FTP server supports MLSD command, a solution is easy:

  • If you want to base the decision on a modification timestamp:

    entries = list(ftp.mlsd())
    # Only interested in directories
    entries = [entry for entry in entries if entry[1]["type"] == "dir"]
    # Sort by timestamp
    entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
    # Pick the first one
    latest_name = entries[0][0]
    print(latest_name)
  • If you want to use a file name:

    # Sort by filename
    entries.sort(key = lambda entry: entry[0], reverse = True)

If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.

A common *nix listing is like:

drw-r--r-- 1 user group           4096 Mar 26  2018 folder1-20180326
drw-r--r-- 1 user group 4096 Jun 18 11:21 folder2-20180618
-rw-r--r-- 1 user group 4467 Mar 27 2018 file-20180327.zip
-rw-r--r-- 1 user group 124529 Jun 18 15:31 file-20180618.zip

With a listing like this, this code will do:

  • If you want to base the decision on a modification timestamp:

    lines = []
    ftp.dir("", lines.append)

    latest_time = None
    latest_name = None

    for line in lines:
    tokens = line.split(maxsplit = 9)
    # Only interested in directories
    if tokens[0][0] == "d":
    time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
    time = parser.parse(time_str)
    if (latest_time is None) or (time > latest_time):
    latest_name = tokens[8]
    latest_time = time

    print(latest_name)
  • If you want to use a file name:

    lines = []
    ftp.dir("", lines.append)

    latest_name = None

    for line in lines:
    tokens = line.split(maxsplit = 9)
    # Only interested in directories
    if tokens[0][0] == "d":
    name = tokens[8]
    if (latest_name is None) or (name > latest_name):
    latest_name = name

    print(latest_name)

Some FTP servers may return . and .. entries in LIST results. You may need to filter those.


Partially based on: Python FTP get the most recent file by date.


If the folder does not contain any files, only subfolders, there are other easier options.

  • If you want to base the decision on a modification timestamp and the server supports non-standard -t switch, you can use:

    lines = ftp.nlst("-t")
    latest_name = lines[-1]

    See How to get files in FTP folder sorted by modification time

  • If you want to use a file name:

    lines = ftp.nlst()
    latest_name = max(lines)

How to get FTP file's modify time using Python ftplib

MLST or MDTM

While you can retrieve a timestamp of an individual file over FTP with MLST or MDTM commands, neither is supported by ftplib.

Of course you can implement the MLST or MDTM on your own using FTP.voidcmd.

For details, refer to RFC 3659, particularly the:

  • 3. File Modification Time (MDTM)
  • 7. Listings for Machine Processing (MLST and MLSD)

A simple example for MDTM:

from ftplib import FTP
from dateutil import parser

# ... (connection to FTP)

timestamp = ftp.voidcmd("MDTM /remote/path/file.txt")[4:].strip()

time = parser.parse(timestamp)

print(time)


MLSD

The only command explicitly supported by the ftplib library that can return standardized file timestamp is MLSD via FTP.mlsd method. Though its use makes sense only if you want to retrieve timestamps for more files.

  • Retrieve a complete directory listing using MLSD
  • Search the returned collection for the file(s) you want
  • Retrieve modify fact
  • Parse it according to the specification, YYYYMMDDHHMMSS[.sss]

For details, refer to RFC 3659 again, particularly the:

  • 7.5.3. The modify Fact section
  • 2.3. Times section
from ftplib import FTP
from dateutil import parser

# ... (connection to FTP)

files = ftp.mlsd("/remote/path")

for file in files:
name = file[0]
timestamp = file[1]['modify']
time = parser.parse(timestamp)
print(name + ' - ' + str(time))

Note that times returned by MLST, MLSD and MDTM are in UTC (unless the server is broken). So you may need to correct them for your local timezone.

Again, refer to RFC 3659 2.3. Times section:

Time values are always represented in UTC (GMT), and in the Gregorian
calendar regardless of what calendar may have been in use at the date
and time indicated at the location of the server-PI.



LIST

If the FTP server does not support any of MLST, MLSD and MDTM, all you can do is to use an obsolete LIST command. That involves parsing a proprietary listing it returns.

A common *nix listing is like:

-rw-r--r-- 1 user group           4467 Mar 27  2018 file1.zip
-rw-r--r-- 1 user group 124529 Jun 18 15:31 file2.zip

With a listing like this, this code will do:

from ftplib import FTP
from dateutil import parser

# ... (connection to FTP)

lines = []
ftp.dir("/remote/path", lines.append)

for line in lines:
tokens = line.split(maxsplit = 9)
name = tokens[8]
time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
time = parser.parse(time_str)
print(name + ' - ' + str(time))


Finding the latest file

See also Python FTP get the most recent file by date.



Related Topics



Leave a reply



Submit