Contents of a gzip file from a AWS S3 in Python only returning null bytes
That's a tar.gz file, i.e. a tar
archive that's been compressed with the gzip
algorithm.
If you just read it with gzip.GzipFile()
, you still have a binary tar archive you need to interpret.
Use the tarfile
module to read it; tar archives, like zips, can contain multiple files, one of which is the .jsonl
file you end up seeing.
Read gzip file from s3 bucket
gzip.open
expects a filename or an already opened file object, but you are passing it the downloaded data directly. Try using gzip.decompress
instead:
filedata = fileobj['Body'].read()
uncompressed = gzip.decompress(filedata)
How can I decode a .gz file from S3 using an AWS Lambda function?
You're correct - you can't decode this into text. You'll want something like:
import io
import gzip
import json
import boto3
from urllib.parse import unquote_plus
def handler_name(event, context):
s3client = boto3.client('s3')
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])
response = s3client.get_object(Bucket=bucket, Key=key)
content = response['Body'].read()
with gzip.GzipFile(fileobj=io.BytesIO(content), mode='rb') as fh:
yourJson = json.load(fh)
You can then use the yourJson
variable to read the JSON.
compress .txt file on s3 location to .gz file
It would appear that you are writing an AWS Lambda function.
A simpler program flow would probably be:
- Download the file to
/tmp/
usings3_client.download_file()
- Gzip the file
- Upload the file to S3 using
s3.client_upload_file()
- Delete the files in
/tmp/
Also, please note that the AWS Lambda function might be invoked with multiple objects being passed via the event
. However, your code is currently only processing the first record with event['Records'][0]
. The program should loop through these records like this:
for record in event['Records']:
source_bucket = record['s3']['bucket']['name']
file_key_name = record['s3']['object']['key']
...
Related Topics
Sklearn: Typeerror: Fit() Missing 1 Required Positional Argument: 'X"
Making a Discord Bot Change Playing Status Every 10 Seconds
How to Delete a Column That Contains Only Zeros in Pandas
How to Find Consecutive Numbers in a Python List
Combine Year, Month and Day in Python to Create a Date
Comparing Two Dataframes and Getting the Differences
How to Maximize a Plt.Show() Window Using Python
Calculate the Lcm of a List of Given Numbers in Python
Django - How to Retrieve Data in Database in Dropdownlist
Python Overflowerror: Int Too Large to Convert to Float
How to Download Outlook Attachment from Python Script
Convert HTML String to an Image in Python
Python-Compare Two String Columns in Same Dataframe, Return Matching Result
Python Check Multi-Level Dict Key Existence
Replace a Word in a String by Indexing Without "String Replace Function" -Python