How to get the latest file of an S3 bucket using Boto3?
Something strange is happening with your sorting method.
Here's some code that uses an S3 Resource to retrieve the latest modified object:
s3_resource = boto3.resource('s3')
objects = list(s3_resource.Bucket('my-bucket').objects.filter(Prefix='my-folder/'))
objects.sort(key=lambda o: o.last_modified)
Boto script to download latest file from s3 bucket
You could list all of the files in the bucket and find the one with the most recent one (using the last_modified attribute).
>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.lookup('mybucketname')
>>> l = [(k.last_modified, k) for k in bucket]
>>> key_to_download = sorted(l, cmp=lambda x,y: cmp(x, y))[-1]
Note, however, that this would be quite inefficient in you had lots of files in the bucket. In that case, you might want to consider using a database to keep track of the files and dates to make querying more efficient.
How to download latest n items from AWS S3 bucket using boto3?
If your application uploads files periodically, you could try this:
last_n_days = 250
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
for obj in page['Contents']:
if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])
With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.
How can I get only the latest file/files created/modified on S3 location through python
You are trying to use
boto library, which is rather obsolete and not maintained. The number of
issues with this library is growing.
Better use currently developed
First, let us define parameters of our search:
>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"
boto3 and create s3 representing S3 resource:
>>> import boto3
>>> s3 = boto3.resource("s3")
Get the bucket:
>>> bucket = s3.Bucket(name=bucket_name)
Define filter for objects with given prefix:
>>> res = bucket.objects.filter(Prefix=prefix)
and iterate over it:
>>> for obj in res:
... print obj.key
... print obj.size
... print obj.last_modified
obj is ObjectSummary (not Object itself), but it holds enought to learn something about it
You can get Object from it and use it as you need:
>>> o = obj.Object()
There are not so many options for filtering, but prefix is available.