Read excel file from S3 into Pandas DataFrame
It is perfectly normal! obj is a dictionnary, have u tried ?
df = pd.read_excel(obj['body'], header=2)
How to import pandas profile report output as html/json file on AWS S3 location
After generating the profile report as
profile = pandas_profiling.ProfileReport(
df, title="Data Profile Report", minimal=True)
To write .html file to S3, we have to first write this file to local filesystem and then upload the file from local filesystem to S3 and finally delete the file from local filesystem as below:
# write .html file to s3
profile.to_file('./file_name-profile.html')
awswrangler.s3.upload(local_file='./file_name-profile.html', path='s3://analytics-storage-bucket/processedData/file_name-profile.html')
os.remove('./file_name-profile.html')
###
This code works on ec2 and aws glue job.
Upload data to S3 bucket without saving it to a disk
Save text file:
obj = 'some string'
bucket = 'my-bucket'
key = 'prefix/filename.txt'
boto3.client('s3').put_object(Body=obj, Bucket=bucket, Key=key)
Save csv file from pandas dataframe:
df = my-dataframe
bucket = 'my-bucket'
key = 'prefix/filename.csv'
csv_buffer = io.StringIO()
df.to_csv(csv_buffer)
boto3.client('s3').put_object(Body=csv_buffer.getvalue(), Bucket=bucket, Key=key)
Files uploaded to s3 are missing content
You need the close the file first so that the data is written to the file system.
with open(f"textfile.txt", "w") as text_file:
text_file.write(description)
#now the with block ends and calls close() on the file and it's written to disk
upload_to_aws("textfile.txt",'bucket-name',"test.txt")
It can be done with flush() also if you'd want to keep the file open to write more but you don't need that here.
Related Topics
Pandas Groupby Without Turning Grouped by Column into Index
Longest Common Substring from More Than Two Strings
Read a Small Random Sample from a Big CSV File into a Python Data Frame
Convert Structured Array to Regular Numpy Array
How to Find Out Whether a File Is at Its 'Eof'
Securely Storing Environment Variables in Gae with App.Yaml
Convert Timedelta64[Ns] Column to Seconds in Python Pandas Dataframe
Uploading Multiple Files with Flask
Getting the Indices of Several Elements in a Numpy Array at Once
How to Make Custom Legend in Matplotlib
How to Use a Custom Comparison Function in Python 3
Why Use Os.Path.Join Over String Concatenation
Matplotlib: How to Prevent X-Axis Labels from Overlapping
How to Find First Non-Zero Value in Every Column of a Numpy Array
Overloaded Functions in Python
How to Sort a List by Length of String Followed by Alphabetical Order