0

I am trying to ingest S3 data(csv file) to RDS(MSSQL) through lambda. Sample code:

s3 = boto3.client('s3')
     if event:
        file_obj = event["Records"][0]
        bucketname = str(file_obj["s3"]["bucket"]["name"])
        csv_filename = unquote_plus(str(file_obj["s3"]["object"]["key"]))
        print("Filename: ", csv_filename)
        csv_fileObj = s3.get_object(Bucket=bucketname, Key=csv_filename)
        file_content = csv_fileObj["Body"].read().decode("utf-8").split()

I have tried put my csv contents into a list but didnt work.

 results = []
        for row in csv.DictReader(file_content):
         results.append(row.values())
        print(results)
        print(file_content)
        return {
           'statusCode': 200,
           'body': json.dumps('S3 file processed')
         }

Is there anyway I could convert "file_content" into a dataframe in Lambda? I have multiple columns to load.

Later I would follow this approach to load the data into RDS

import pyodbc
import pandas as pd
# insert data from csv file into dataframe(df).
server = 'yourservername' 
database = 'AdventureWorks' 
username = 'username' 
password = 'yourpassword' 
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
     cursor.execute("INSERT INTO HumanResources.DepartmentTest (DepartmentID,Name,GroupName) values(?,?,?)", row.DepartmentID, row.Name, row.GroupName)
cnxn.commit()
cursor.close()

Can anyone suggest how to go about it?

5
  • Side-note: Your code is only processing the first record sent to the Lambda function (event["Records"][0]). It is possible that multiple event records can be sent to the Lambda function, so your code should loop through and process each Record. Commented Feb 3, 2022 at 10:47
  • What is the contents of the object in S3? Commented Feb 3, 2022 at 10:47
  • Hi @JohnRotenstein, its a csv file. File size 12mb Commented Feb 4, 2022 at 0:27
  • Why do you particularly want to use Dataframes? The AWS Lambda function can read the CSV file directly and generate the SQL commands. Commented Feb 4, 2022 at 0:32
  • I tried creating a list but didnt work. updated my question above. Hence, tried creating a dataframe. could you pls suggest anything else? Commented Feb 4, 2022 at 0:35

2 Answers 2

1

You can use io.BytesIO to get the bytes data into memory and after that use pandasread_csv to transform it into a dataframe. Note that there is some strange SSL download limit for dataframes that will lead to issue when downloading data > 2GB. That is why I have used this chunking in the code below.

import io
obj = s3.get_object(Bucket=bucketname, Key=csv_filename)
# This should prevent the 2GB download limit from a python ssl internal
chunks = (chunk for chunk in obj["Body"].iter_chunks(chunk_size=1024**3))
data = io.BytesIO(b"".join(chunks)) # This keeps everything fully in memory
df = pd.read_csv(data) # here you can provide also some necessary args and kwargs
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @simon, I tried your solution above. but i m getting an error saying " [ERROR] MemoryError" in this line "data = io.BytesIO(b"".join(chunks))". my csv file size is just 12mb
It worked. I had to increase lambda function memory..
1

It appears that your goal is to load the contents of a CSV file from Amazon S3 into SQL Server.

You could do this without using Dataframes:

  • Loop through the Event Records (multiple can be passed-in)
  • For each object:
    • Download the object to /tmp/
    • Use the Python CSVReader to loop through the contents of the file
    • Generate INSERT statements to insert the data into the SQL Server table

You might also consider using aws-data-wrangler: Pandas on AWS, which is available as a Lambda Layer.

3 Comments

Hi @John, yes my goal is to load contents of a CSV file from Amazon S3 into RDS MSSQL Server. i am unable to perform 2 steps mentioned above. not sure how to do it though!. could u pls assist. this is something new for me.
You are welcome to create a new Question, show your code and provide details of the problem you are experiencing.
Problem got fixed. i am able to load s3 contents into RDS..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.