Automatically Sync Two Amazon S3 Buckets, Besides S3Cmd

s3cmd sync move two buckets into a single bucket

Thanks to @fviard from over at github for answering my queston. copied here is the answer i received:

By default, the sync don't delete files at destination that are not in
the source. It can tell you that in the summary, but it will not do
it. Check that you have the following configuration: delete_after =
False delete_after_fetch = False delete_removed = False and that you
don't use a option on the command line like "--delete-removed".

Btw, you are not needed to do things in separated commands. Without
the "skip-existing", you can do something like that: sync
s3://source1/ s3://source2/ s3://source3/ s3://mydestination/

TLDR; it only deletes the files if you configured it to do so in the config. otherwise the warning message can be ignored.

S3 moving files between buckets on different accounts?

You don't have to open permissions to everyone. Use the below Bucket policies on source and destination for copying from a bucket in one account to another using an IAM user

  • Bucket to Copy from: SourceBucket

  • Bucket to Copy to: DestinationBucket

  • Source AWS Account ID: XXXX–XXXX-XXXX

  • Source IAM User: src–iam-user

The below policy means – the IAM user - XXXX–XXXX-XXXX:src–iam-user has s3:ListBucket and s3:GetObject privileges on SourceBucket/* and s3:ListBucket and s3:PutObject privileges on DestinationBucket/*

On the SourceBucket the policy should be like:

{
"Id": "Policy1357935677554",
"Statement": [{
"Sid": "Stmt1357935647218",
"Action": ["s3:ListBucket"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::SourceBucket",
"Principal": {"AWS": "arn:aws:iam::XXXXXXXXXXXX:user/src–iam-user"}
}, {
"Sid": "Stmt1357935676138",
"Action": ["s3:GetObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::SourceBucket/*",
"Principal": {"AWS": "arn:aws:iam::XXXXXXXXXXXX:user/src–iam-user"}
}]
}

On the DestinationBucket the policy should be:

{
"Id": "Policy1357935677555",
"Statement": [{
"Sid": "Stmt1357935647218",
"Action": ["s3:ListBucket"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::DestinationBucket",
"Principal": {"AWS": "arn:aws:iam::XXXXXXXXXXXX:user/src–iam-user"}
}, {
"Sid": "Stmt1357935676138",
"Action": ["s3:PutObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::DestinationBucket/*",
"Principal": {"AWS": "arn:aws:iam::XXXXXXXXXXXX:user/src–iam-user"}
}]
}

Command to be run is s3cmd cp s3://SourceBucket/File1 s3://DestinationBucket/File1

Synchronizing S3 Folders/Buckets

CloudBerry Explorer comes with PowerShell command line interface and you can learn here how to use it to do sync.

How can I backup or sync an Amazon S3 bucket?

I prefer to backup locally using sync where only changes are updated. That is not the perfect backup solution but you can implement periodic updates later as you need:

s3cmd sync --delete-removed s3://your-bucket-name/ /path/to/myfolder/

If you never used s3cmd, install and configure it using:

pip install s3cmd
s3cmd --configure

Also there should be S3 backup services for $5/month but I would also check Amazon Glacier which lets you put nearly 40 GB single archive file if you use multi-part upload.

http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html#qfacts

Remember, if your S3 account is compromised, you have chance to lose all of your data as you would sync an empty folder or malformed files. So, you better write a script to archive your backup few times, for e.g by detecting start of the week.

Update 01/17/2016:

Python based AWS CLI is very mature now.

Please use: https://github.com/aws/aws-cli

Example: aws s3 sync s3://mybucket .

Best way to move files between S3 buckets?

Update

As pointed out by alberge (+1), nowadays the excellent AWS Command Line Interface provides the most versatile approach for interacting with (almost) all things AWS - it meanwhile covers most services' APIs and also features higher level S3 commands for dealing with your use case specifically, see the AWS CLI reference for S3:

  • sync - Syncs directories and S3 prefixes. Your use case is covered by Example 2 (more fine grained usage with --exclude, --include and prefix handling etc. is also available):

    The following sync command syncs objects under a specified prefix and bucket to objects under another specified prefix and bucket by copying s3 objects. [...]

    aws s3 sync s3://from_my_bucket s3://to_my_other_bucket

For completeness, I'll mention that the lower level S3 commands are also still available via the s3api sub command, which would allow to directly translate any SDK based solution to the AWS CLI before adopting its higher level functionality eventually.


Initial Answer

Moving files between S3 buckets can be achieved by means of the PUT Object - Copy API (followed by DELETE Object):

This implementation of the PUT operation creates a copy of an object
that is already stored in Amazon S3. A PUT copy operation is the same
as performing a GET and then a PUT. Adding the request header,
x-amz-copy-source, makes the PUT operation copy the source object into
the destination bucket. Source

There are respective samples for all existing AWS SDKs available, see Copying Objects in a Single Operation. Naturally, a scripting based solution would be the obvious first choice here, so Copy an Object Using the AWS SDK for Ruby might be a good starting point; if you prefer Python instead, the same can be achieved via boto as well of course, see method copy_key() within boto's S3 API documentation.

PUT Object only copies files, so you'll need to explicitly delete a file via DELETE Object still after a successful copy operation, but that will be just another few lines once the overall script handling the bucket and file names is in place (there are respective examples as well, see e.g. Deleting One Object Per Request).

Exclude folders for s3cmd sync

You should indeed use the --exclude option. If you want to sync every file on the root but not the folders, you should try :

s3cmd --exclude="/*/*" sync local/ s3://s3bucket

Keep in mind that a folder doesn't really exist on S3. What seems to be a file file in a folder folder is just a file named folder/file! So you just have to exclude file with the pattern /*/*.



Related Topics



Leave a reply



Submit