Read a File Line by Line from S3 Using Boto

Read a file line by line from S3 using boto?

It appears that boto has a read() function that can do this. Here's some code that works for me:

>>> import boto
>>> from boto.s3.key import Key
>>> conn = boto.connect_s3('ap-southeast-2')
>>> bucket = conn.get_bucket('bucket-name')
>>> k = Key(bucket)
>>> k.key = 'filename.txt'
>>> k.open()
>>> k.read(10)
'This text '

The call to read(n) returns the next n bytes from the object.

Of course, this won't automatically return "the header line", but you could call it with a large enough number to return the header line at a minimum.

Read file content from S3 bucket with boto3

boto3 offers a resource model that makes tasks like iterating through objects easier. Unfortunately, StreamingBody doesn't provide readline or readlines.

s3 = boto3.resource('s3')
bucket = s3.Bucket('test-bucket')
# Iterates through all the objects, doing the pagination for you. Each obj
# is an ObjectSummary, so it doesn't contain the body. You'll need to call
# get to get the whole body.
for obj in bucket.objects.all():
key = obj.key
body = obj.get()['Body'].read()

read .txt file from s3 bucket not returning all file content

CloudWatch Logs for this Lambda function should be the definitive view of the printed logs.

Your code looks to be correct - the read function on StreamingBody returns all data (if you don't specify an amount parameter), so I don't think there's a problem with your code. It is receiving the entire file contents.

It looks like the truncated view you are seeing in the Lambda console may simply be a limitation of the console, in order to avoid showing an overwhelming number of lines of output.

Read a csv file from aws s3 using boto and pandas

Here is what I have done to successfully read the df from a csv on S3.

import pandas as pd
import boto3

bucket = "yourbucket"
file_name = "your_file.csv"

s3 = boto3.client('s3')
# 's3' is a key word. create connection to S3 using default config and all buckets within S3

obj = s3.get_object(Bucket= bucket, Key= file_name)
# get object and file (key) from bucket

initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word

How to read Txt file from S3 Bucket using Python And Boto3

Your ids is literal string ['i-041fb789f1554b7d5', 'i-0d0c876682eef71ae'], not a list. To parse it and convert to list use ast module:

import ast
# ...
InstancetobeStart = (obj.get()['Body'].read().decode('utf-8'))
ids = ast.literal_eval(InstancetobeStart)

Reading part of a file in S3 using Boto

S3 supports GET requests using the 'Range' HTTP header which is what you're after.

To specify a Range request in boto, just add a header dictionary specifying the 'Range' key for the bytes you are interested in. Adapted from Mitchell Garnaat's response:

import boto
s3 = boto.connect_s3()
bucket = s3.lookup('mybucket')
key = bucket.lookup('mykey')
your_bytes = key.get_contents_as_string(headers={'Range' : 'bytes=73-1024'})

How to read .dat file from AWS S3 using mdfreader

The easiest method would be to use download_file() to download the file from Amazon S3 to /tmp/ on the local disk.

Then, you can use your existing code to process the file. This is definitely not a 'hack' -- it is a commonly used technique. It's certainly more reliable than streaming the file.

There is a limit on the amount of storage available and AWS Lambda containers can be reused, so either delete the temporary file after use, or use the same filename (eg /tmp/temp.dat) each time so that it overwrites the previous version.



Related Topics



Leave a reply



Submit