9

There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.

In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:

with open(<csv_file_name>, mode='r') as file:
reader = csv.reader(file)

However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!

Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.

1
  • You can read the CSV data with something like response['Body'].read() and then pass the result into csv.reader() (although you may have to decode it first and split it into lines). Commented Oct 25, 2017 at 23:29

2 Answers 2

21

csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.

So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is

lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)
Sign up to request clarification or add additional context in comments.

3 Comments

You may also need to decode the bytestream, depending on your encoding - response['Body'].read().decode('utf-8').splitlines(True)
This appears to vary between python 2 and 3. The 2.7 docs say "The csv module doesn’t directly support reading and writing Unicode" and recommend feeding it utf-8.
I see that the CSV is being used by Lambda. Problem is when I try to do any ETL transformations I am finding it difficult. How do I perform ETL jobs in Lambda, as I am not able to open CSV as a file, but only reading it. Is there a way to open it as CSV and start working on it.
7

To retrieve and read CSV file from s3 bucket, you can use the following code:

import csv
import boto3
from django.conf import settings

bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"

s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)

bucket = s3.Bucket(bucket_name)

obj = bucket.Object(key=file_name)

response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)

reader = csv.DictReader(lines)
for row in reader:
    # csv_header_key is the header keys which you have defined in your csv header
    print(row['csv_header_key1'], row['csv_header_key2')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.