Listing Contents of a Bucket with Boto3

Listing contents of a bucket with boto3

One way to see the contents would be:

for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object)

Python boto, list contents of specific dir in bucket

For boto3

import boto3

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket_name')

for object_summary in my_bucket.objects.filter(Prefix="dir_name/"):
print(object_summary.key)

how to list files from a S3 bucket folder using python

You can't indicate a prefix/folder in the Bucket constructor. Instead use the client-level API and call list_objects_v2 something like this:

import boto3

client = boto3.client('s3')

response = client.list_objects_v2(
Bucket='my_bucket',
Prefix='data/')

for content in response.get('Contents', []):
print(content['Key'])

Note that this will yield at most 1000 S3 objects. You can use a paginator if needed.

Listing objects in S3 with suffix using boto3

You can check if they end with .csv:

def get_latest_file_movement(**kwargs):
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]

return last_added

List directory contents of an S3 bucket using Python and Boto3?

All these other responses leave things to be desired. Using

client.list_objects()

Limits you to 1k results max. The rest of the answers are either wrong or too complex.

Dealing with the continuation token yourself is a terrible idea. Just use paginator, which deals with that logic for you

The solution you want is:

[e['Key'] for p in client.get_paginator("list_objects_v2")\
.paginate(Bucket='my_bucket')
for e in p['Contents']]

listing s3 buckets using boto3 and python

Your bucket name is madl-temp and prefix is maxValue. But in boto3, you have the opposite. So it should be:

s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='madl-temp',
Prefix='maxValue/')

To get the number of files you have to do:

len(object_listing['Contents']) - 1

where -1 accounts for a prefix maxValue/.

Retrieving subfolders names in S3 bucket from boto3

S3 is an object storage, it doesn't have real directory structure. The "/" is rather cosmetic.
One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.

To manipulate object in S3, you need boto3.client or boto3.resource, e.g.
To list all object

import boto3 
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = 'bucket-name')

http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects

In fact, if the s3 object name is stored using '/' separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.

To limit the items to items under certain sub-folders:

    import boto3 
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
MaxKeys=100 )

Documentation

Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.

import os
s3_key = 'first-level/1456753904534/part-00014'
filename = os.path.basename(s3_key)
foldername = os.path.dirname(s3_key)

# if you are not using conventional delimiter like '#'
s3_key = 'first-level#1456753904534#part-00014'
filename = s3_key.split("#")[-1]

A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.



Related Topics



Leave a reply



Submit