Move files between folder on Amazon S3 using boto3 ( The specified bucket does not exist error)
The CopySource
parameter is defined as:
The string form is {bucket}/{key} or {bucket}/{key}?versionId={versionId} if you want to copy a specific version. You can also provide this value as a dictionary. The dictionary format is recommended over the string format because it is more explicit. The dictionary format is: {'Bucket': 'bucket', 'Key': 'key', 'VersionId': 'id'}. Note that the VersionId key is optional and may be omitted.
Therefore this line:
CopySource='my-folder/my.txt')
should include the bucket name at the start:
CopySource='dev-files/my-folder/my.txt')
how to copy files and folders from one S3 bucket to another S3 using python boto3
S3 does not have any concept of folder/directories. It follows a flat structure.
For example it seems,
On UI you see 2 files inside test_folder with named file1.txt and file2.txt, but actually two files will have key as
"test_folder/file1.txt" and "test_folder/file2.txt".
Each file is stored with this naming convention.
You can use code snippet given below to copy each key to some other bucket.
import boto3
s3_client = boto3.client('s3')
resp = s3_client.list_objects_v2(Bucket='mybucket')
keys = []
for obj in resp['Contents']:
keys.append(obj['Key'])
s3_resource = boto3.resource('s3')
for key in keys:
copy_source = {
'Bucket': 'mybucket',
'Key': key
}
bucket = s3_resource.Bucket('otherbucket')
bucket.copy(copy_source, 'otherkey')
If your source bucket contains many keys, and this is a one time activity, then I suggest you to checkout this link.
If this needs to be done for every insert event on your bucket and you need to copy that to another bucket, you can checkout this approach.
How do I move/copy files in s3 using boto3 asynchronously?
I use the following. You can copy into a python file and run it from the command line. I have a PC with 8 cores, so it's faster than my little EC2 instance with 1 VPC.
It uses the multiprocessing
library, so you'd want to read up on that if you aren't familiar. It's relatively straightforward. There's a batch delete that I've commented out because you really don't want to accidentally delete the wrong directory. You can use whatever methods you want to list the keys or iterate through the objects, but this works for me.
from multiprocessing import Pool
from itertools import repeat
import boto3
import os
import math
s3sc = boto3.client('s3')
s3sr = boto3.resource('s3')
num_proc = os.cpu_count()
def get_list_of_keys_from_prefix(bucket, prefix):
"""gets list of keys for given bucket and prefix"""
keys_list = []
paginator = s3sr.meta.client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
keys = [content['Key'] for content in page.get('Contents')]
keys_list.extend(keys)
if prefix in keys_list:
keys_list.remove(prefix)
return keys_list
def batch_delete_s3(keys_list, bucket):
total_keys = len(keys_list)
chunk_size = 1000
num_batches = math.ceil(total_keys / chunk_size)
for b in range(0, num_batches):
batch_to_delete = []
for k in keys_list[chunk_size*b:chunk_size*b+chunk_size]:
batch_to_delete.append({'Key': k})
s3sc.delete_objects(Bucket=bucket, Delete={'Objects': batch_to_delete,'Quiet': True})
def copy_s3_to_s3(from_bucket, from_key, to_bucket, to_key):
copy_source = {'Bucket': from_bucket, 'Key': from_key}
s3sr.meta.client.copy(copy_source, to_bucket, to_key)
def upload_multiprocess(from_bucket, keys_list_from, to_bucket, keys_list_to, num_proc=4):
with Pool(num_proc) as pool:
r = pool.starmap(copy_s3_to_s3, zip(repeat(from_bucket), keys_list_from, repeat(to_bucket), keys_list_to), 15)
pool.close()
pool.join()
return r
if __name__ == '__main__':
__spec__= None
from_bucket = 'from-bucket'
from_prefix = 'from/prefix/'
to_bucket = 'to-bucket'
to_prefix = 'to/prefix/'
keys_list_from = get_list_of_keys_from_prefix(from_bucket, from_prefix)
keys_list_to = [to_prefix + k.rsplit('/')[-1] for k in keys_list_from]
rs = upload_multiprocess(from_bucket, keys_list_from, to_bucket, keys_list_to, num_proc=num_proc)
# batch_delete_s3(keys_list_from, from_bucket)
Related Topics
Get Discord User Id from Username
How to Count the Number of Messages
Best Way to Get the Max Value in a Spark Dataframe Column
How to Allocate Array With Shape and Data Type
How to Make a Tkinter Label Background Transparent
Macos: How to Downgrade Homebrew Python
How to Clear/Delete the Contents of a Tkinter Text Widget
Python: How to Check If Cell in CSV File Is Empty
Python Selenium - Element Is Not Currently Interactable and May Not Be Manipulated
How to Change Python Version in Anaconda Spyder
Replace Single Quote With Double Quote in a String Python
How to Divide Each Column of Pandas Dataframe by a Series
Python Format Size Application (Converting B to Kb, Mb, Gb, Tb)
How to Verify If a Button Is Enabled and Disabled in Webdriver Python