How to Download the Latest File of an S3 Bucket Using Boto3

How to get the latest file of an S3 bucket using Boto3?

Something strange is happening with your sorting method.

Here's some code that uses an S3 Resource to retrieve the latest modified object:

import boto3

s3_resource = boto3.resource('s3')

objects = list(s3_resource.Bucket('my-bucket').objects.filter(Prefix='my-folder/'))
objects.sort(key=lambda o: o.last_modified)

print(objects[-1].key)

Boto script to download latest file from s3 bucket

You could list all of the files in the bucket and find the one with the most recent one (using the last_modified attribute).

>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.lookup('mybucketname')
>>> l = [(k.last_modified, k) for k in bucket]
>>> key_to_download = sorted(l, cmp=lambda x,y: cmp(x[0], y[0]))[-1][1]
>>> key_to_download.get_contents_to_filename('myfile')

Note, however, that this would be quite inefficient in you had lots of files in the bucket. In that case, you might want to consider using a database to keep track of the files and dates to make querying more efficient.

How to download latest n items from AWS S3 bucket using boto3?

If your application uploads files periodically, you could try this:

import boto3
import datetime

last_n_days = 250
s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
for obj in page['Contents']:
if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])

With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.

How can I get only the latest file/files created/modified on S3 location through python

You are trying to use boto library, which is rather obsolete and not maintained. The number of
issues with this library is growing.

Better use currently developed boto3.

First, let us define parameters of our search:

>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"

Do import boto3 and create s3 representing S3 resource:

>>> import boto3
>>> s3 = boto3.resource("s3")

Get the bucket:

>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')

Define filter for objects with given prefix:

>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)

and iterate over it:

>>> for obj in res:
... print obj.key
... print obj.size
... print obj.last_modified
...

Each obj is ObjectSummary (not Object itself), but it holds enought to learn something about it

>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary

You can get Object from it and use it as you need:

>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')

There are not so many options for filtering, but prefix is available.



Related Topics



Leave a reply



Submit