4

I have an AWS lambda function which creates a data frame, I need to write this file to a S3 bucket.

import pandas as pd
import boto3
import io

# code to get the df

destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'

df.to_json(destination) # this file should be written to S3 bucket

2 Answers 2

8

The following code runs in AWS Lambda and uploads the json file to S3.

Lambda role should have S3 access permissions.

import pandas as pd
import boto3
import io

# code to get the df

destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'

json_buffer = io.StringIO()

df.to_json(json_buffer)

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket-name')

my_bucket.put_object(Key=destination, Body=json_buffer.getvalue())


Sign up to request clarification or add additional context in comments.

1 Comment

Nailed it. All other s3/json answers on StackOverflow would not reconcile with the df.to_json part, which was a must for my problem. Thank you.
0

You can use following code as well

#Creating Session using Boto3

session = boto3.Session(
aws_access_key_id='<key ID>',
aws_secret_access_key='<secret_key>'
)
 
#Create s3 session with boto3

s3 = session.resource('s3')
 
json_buffer = io.StringIO()
 
# Create dataframe and convert to pandas
df = spark.range(4).withColumn("organisation", lit("stackoverflow"))
df_p = df.toPandas()
df_p.to_json(json_buffer, orient='records')
 
#Create s3 object
object = s3.Object('<bucket-name>', '<JSON file name>')
 
#Put the object into bucket
result = object.put(Body=json_buffer.getvalue())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.