Boto3 to Download All Files from a S3 Bucket

Boto3 to download all files from a S3 Bucket

When working with buckets that have 1000+ objects its necessary to implement a solution that uses the NextContinuationToken on sequential sets of, at most, 1000 keys. This solution first compiles a list of objects then iteratively creates the specified directories and downloads the existing objects.

import boto3
import os

s3_client = boto3.client('s3')

def download_dir(prefix, local, bucket, client=s3_client):
"""
params:
- prefix: pattern to match in s3
- local: local path to folder in which to place files
- bucket: s3 bucket with target contents
- client: initialized s3 client object
"""
keys = []
dirs = []
next_token = ''
base_kwargs = {
'Bucket':bucket,
'Prefix':prefix,
}
while next_token is not None:
kwargs = base_kwargs.copy()
if next_token != '':
kwargs.update({'ContinuationToken': next_token})
results = client.list_objects_v2(**kwargs)
contents = results.get('Contents')
for i in contents:
k = i.get('Key')
if k[-1] != '/':
keys.append(k)
else:
dirs.append(k)
next_token = results.get('NextContinuationToken')
for d in dirs:
dest_pathname = os.path.join(local, d)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
for k in keys:
dest_pathname = os.path.join(local, k)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
client.download_file(bucket, k, dest_pathname)

How to download everything in that folder using boto3

Marcin answer is correct but files with the same name in different paths would be overwritten.
You can avoid that by replicating the folder structure of the S3 bucket locally.

import boto3
import os
from pathlib import Path

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket')

key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))

for obj in objs:
# print(obj.key)

# remove the file name from the object key
obj_path = os.path.dirname(obj.key)

# create nested directory structure
Path(obj_path).mkdir(parents=True, exist_ok=True)

# save file with full path locally
bucket.download_file(obj.key, obj.key)

Download a folder from S3 using Boto3

quick and dirty but it works:

import boto3
import os

def downloadDirectoryFroms3(bucketName, remoteDirectoryName):
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(bucketName)
for obj in bucket.objects.filter(Prefix = remoteDirectoryName):
if not os.path.exists(os.path.dirname(obj.key)):
os.makedirs(os.path.dirname(obj.key))
bucket.download_file(obj.key, obj.key) # save to same path

Assuming you want to download the directory foo/bar from s3 then the for-loop will iterate all the files whose path starts with the Prefix=foo/bar.

Python 3 + boto3 + s3: download all files in a folder

You can use listobjectsv2 and pass the prefix to only get the keys inside your s3 "folder". Now you can use a for loop to go through all these keys and download them all. Use conditions if you need to filter them more.

Download Entire Content of a subfolder in a S3 bucket

I think your best bet would be the awscli

aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination

From the docs:

--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

EDIT:

To do this with boto3 try this:

import os
import errno
import boto3

client = boto3.client('s3')

def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise

def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'

paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)

download_dir('your_bucket', 'your_folder', 'destination')

Download multiple files from S3 bucket using boto3

To read the CSV file you can use csv library (see: https://docs.python.org/fr/3.6/library/csv.html)
Example:

import csv
with open('file.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)

To push files to the new bucket, you can use the copy method (see: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy)

Example:

import boto3
s3 = boto3.resource('s3')
source = {
'Bucket': 'BUCKET-NAME',
'Key': 'mykey'
}
bucket = s3.Bucket('SECOND_BUCKET-NAME')
bucket.copy(source, 'SECOND_BUCKET-NAME')

how to download specific folder content from Aws s3 bucked using python

Here is an example of how to do that with Minio(Amazon S3 compatible) using python:

client = Minio(
"localhost:port",
access_key="access_key",
secret_key="secret_key",
secure=False,
)
objects = client.list_objects("index", prefix="public/")
for obj in objects:
<Do something ....>

How to download latest n items from AWS S3 bucket using boto3?

If your application uploads files periodically, you could try this:

import boto3
import datetime

last_n_days = 250
s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
for obj in page['Contents']:
if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])

With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.



Related Topics



Leave a reply



Submit