How to Use Boto to Stream a File Out of Amazon S3 to Rackspace Cloudfiles

How can I use boto to stream a file out of Amazon S3 to Rackspace Cloudfiles?

The Key object in boto, which represents on object in S3, can be used like an iterator so you should be able to do something like this:

>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.lookup('garnaat_pub')
>>> key = bucket.lookup('Scan1.jpg')
>>> for bytes in key:
...   write bytes to output stream

Or, as in the case of your example, you could do:

>>> shutil.copyfileobj(key, rsObject.stream())

Can Rackspace Cloud Files be accessed using S3 (AWS) APIs?

The S3 plugin for Swift is not deployed as part of Rackspace Cloud Files (most production deploys of openstack don't deploy it by default). However, if you want better flexibility in the app, you can use a cross cloud toolkit such as libcloud (python), fog (ruby), jclouds (java), pkgcloud (node/js). This means you can use a simpler abstraction and support multiple providers within your application.

Stream huge zip files on S3 using Lambda and boto3

Depending on your exact needs, you can use smart-open to handle the reading of the zip File. If you can fit the CSV data in RAM in your Lambda, it's fairly straightforward to call directly:

from smart_open import smart_open
from io import TextIOWrapper, BytesIO

def lambda_handler(event, context):
    # Simple test, just calculate the sum of the first column of a CSV file in a Zip file
    total_sum, row_count = 0, 0
    # Use smart open to handle the byte range requests for us
    with smart_open("s3://example-bucket/many_csvs.zip", "rb") as f:
        # Wrap that in a zip file handler
        zip = zipfile.ZipFile(f)
        # Open a specific CSV file in the zip file
        zf = zip.open("data_101.csv")
        # Read all of the data into memory, and prepare a text IO wrapper to read it row by row
        text = TextIOWrapper(BytesIO(zf.read()))
        # And finally, use python's csv library to parse the csv format
        cr = csv.reader(text)
        # Skip the header row
        next(cr)
        # Just loop through each row and add the first column
        for row in cr:
            total_sum += int(row[0])
            row_count += 1

    # And output the results
    print(f"Sum {row_count} rows for col 0: {total_sum}")

I tested this with a 1gb zip file containing hundreds of CSV files. The CSV file I picked was around 12mb uncompressed, or 100,000 rows, so it felt nicely into RAM in the Lambda environment, even when limited to 128mb of RAM.

If your CSV file can't be loaded at once like this, you'll need to take care to load it in sections, buffering the reads so you don't waste time reading it line-by-line and forcing smart-open to load small chunks at a time.

Is there a Amazon S3 Bucket Policy equivalent in the Rackspace Cloudfiles?

As of today there isn't, I literally just finished talking with techs there about this specifically thinking I was just missing it in the docs. And found out that they have no IP restrictions for protecting contents in containers.. don't know how they over-looked that one!

Read a file line by line from S3 using boto?

It appears that boto has a read() function that can do this. Here's some code that works for me:

>>> import boto
>>> from boto.s3.key import Key
>>> conn = boto.connect_s3('ap-southeast-2')
>>> bucket = conn.get_bucket('bucket-name')
>>> k = Key(bucket)
>>> k.key = 'filename.txt'
>>> k.open()
>>> k.read(10)
'This text '

The call to read(n) returns the next n bytes from the object.

Of course, this won't automatically return "the header line", but you could call it with a large enough number to return the header line at a minimum.

Create Rackspace Cloudfiles user via API

It's possible.

You would need to use the Rackspace Identity Service API. In particular, have a look at the Users resource API. If you want to restrict those users to a product like Cloud Files you'll need to use the Roles resource API.

It will also be helpful to read up on the Rackspace Role Based Access Control. You'll need to know the Permissions Matrix for RBAC if you're assigning roles to your users.