How to Parse a CSV File Located in a Amazon S3 Bucket

How to read and process large text/CSV files from an S3 bucket using C#?

Increase your Lambda timeout, which (currently) has a hard limit of 15 minutes.

If your CSV processing takes longer than 15 minutes, Lambda functions are not the right solution for your job - they are meant for quick processing.

What would be the right solution is out of scope but you could perhaps utilise spot EC2 instances, step functions, run containers on Fargate etc.

Related: to speed up your current process, make parallel requests to S3 at the beginning and then process in one go i.e. create the tasks and then await them all at once.

Read and parse CSV file in S3 without downloading the entire file

You should just be able to use the createReadStream method and pipe it into fast-csv:

const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').fromStream(s3Stream)
.on('data', (data) => {
// do something here
})

Extract specific column from csv stored in S3

You can use the result returned from Amazon Athena via get_query_results().

If the data variable contains the JSON shown in your question, you can extract a list of the instances with:

rows = [row['Data'][1]['VarCharValue'].replace('"', '') for row in data]
print(rows)

The output is:

['instanceId', 'i-053090803', 'i-0724f62a', 'i-552', 'i-07f4e5', 'i-0eb453', 'i-062120', 'i-0121a04', 'i-0f213', 'i-0ee19d8', 'i-04ad3c29', 'i-7c6166', 'i-07bc579d', 'i-0b8bc7df5']

You can skip the column header by referencing: rows[1:]

How do I read a csv file from aws s3 in aws lambda

https://docs.python.org/3/library/csv.html

According to the documents, I think you used the wrong way of csv module. So the reader is empty and that's why your code does not return anything



Related Topics



Leave a reply



Submit