Python - How to read CSV file retrieved from S3 bucket?

Question

There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.

In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:

with open(<csv_file_name>, mode='r') as file:
reader = csv.reader(file)

However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!

Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.

You can read the CSV data with something like response['Body'].read() and then pass the result into csv.reader() (although you may have to decode it first and split it into lines). — user8651755
– user8651755, Commented Oct 25, 2017 at 23:29

Aaron Bentley · Accepted Answer · 2017-10-26 00:47:20Z

21

csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.

So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is

lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)

answered Oct 26, 2017 at 0:47

Aaron Bentley

1,3908 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

shanesolo Over a year ago

You may also need to decode the bytestream, depending on your encoding - response['Body'].read().decode('utf-8').splitlines(True)

Aaron Bentley Over a year ago

This appears to vary between python 2 and 3. The 2.7 docs say "The csv module doesn’t directly support reading and writing Unicode" and recommend feeding it utf-8.

Hyder Tom Over a year ago

I see that the CSV is being used by Lambda. Problem is when I try to do any ETL transformations I am finding it difficult. How do I perform ETL jobs in Lambda, as I am not able to open CSV as a file, but only reading it. Is there a way to open it as CSV and start working on it.

Chirag Kalal · Accepted Answer · 2020-09-02 10:11:05Z

7

To retrieve and read CSV file from s3 bucket, you can use the following code:

import csv
import boto3
from django.conf import settings

bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"

s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)

bucket = s3.Bucket(bucket_name)

obj = bucket.Object(key=file_name)

response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)

reader = csv.DictReader(lines)
for row in reader:
    # csv_header_key is the header keys which you have defined in your csv header
    print(row['csv_header_key1'], row['csv_header_key2')

answered Sep 2, 2020 at 10:11

Chirag Kalal

7281 gold badge8 silver badges24 bronze badges

Collectives™ on Stack Overflow

Python - How to read CSV file retrieved from S3 bucket?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related