How to get the latest file of an S3 bucket using Boto3?
Something strange is happening with your sorting method.
Here's some code that uses an S3 Resource to retrieve the latest modified object:
import boto3
s3_resource = boto3.resource('s3')
objects = list(s3_resource.Bucket('my-bucket').objects.filter(Prefix='my-folder/'))
objects.sort(key=lambda o: o.last_modified)
print(objects[-1].key)
Boto script to download latest file from s3 bucket
You could list all of the files in the bucket and find the one with the most recent one (using the last_modified attribute).
>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.lookup('mybucketname')
>>> l = [(k.last_modified, k) for k in bucket]
>>> key_to_download = sorted(l, cmp=lambda x,y: cmp(x[0], y[0]))[-1][1]
>>> key_to_download.get_contents_to_filename('myfile')
Note, however, that this would be quite inefficient in you had lots of files in the bucket. In that case, you might want to consider using a database to keep track of the files and dates to make querying more efficient.
How to download latest n items from AWS S3 bucket using boto3?
If your application uploads files periodically, you could try this:
import boto3
import datetime
last_n_days = 250
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
for obj in page['Contents']:
if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])
With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.
How can I get only the latest file/files created/modified on S3 location through python
You are trying to use boto
library, which is rather obsolete and not maintained. The number of
issues with this library is growing.
Better use currently developed boto3
.
First, let us define parameters of our search:
>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"
Do import boto3
and create s3 representing S3 resource:
>>> import boto3
>>> s3 = boto3.resource("s3")
Get the bucket:
>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')
Define filter for objects with given prefix:
>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)
and iterate over it:
>>> for obj in res:
... print obj.key
... print obj.size
... print obj.last_modified
...
Each obj
is ObjectSummary (not Object itself), but it holds enought to learn something about it
>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary
You can get Object from it and use it as you need:
>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')
There are not so many options for filtering, but prefix is available.
Related Topics
How to Get the Column Name in Pandas Based on Row Values
How to Sort the Letters in a String Alphabetically in Python
Regular Expression to Check Whitespace in the Beginning and End of a String
How to Get Text from Span Tag in Beautifulsoup
How to Execute Two Commands in Terminal Using Python'S Subprocess Module
Tkinter: How to Use Threads to Preventing Main Event Loop from "Freezing"
Comparing Two Json Objects Irrespective of the Sequence of Elements in Them
Print Floating Point Values Without Leading Zero
Plot Different Dataframes in the Same Figure
How to Clear or Overwrite a Tkinter Canvas
Pyspark Regexp_Replace With List Elements Are Not Replacing the String
How to Sort a Single String Output in Ascii Descending Order Through a Function
Python - Split Array into Multiple Arrays
Python Creating Dictionary from Excel Data
Invalidargumenterror: Logits and Labels Must Have the Same First Dimension Seq2Seq Tensorflow