8

I have a SNS notification setup that triggers a Lambda function when a .xlsx file is uploaded to S3 bucket.

The lambda function reads the .xlsx file into Pandas DataFrame.

import os 
import pandas as pd
import json
import xlrd
import boto3

def main(event, context):
    message = event['Records'][0]['Sns']['Message']
    parsed_message = json.loads(message)
    src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
    filepath = parsed_message['Records'][0]['s3']['object']['key']

    s3 = boto3.resource('s3')
    s3_client = boto3.client('s3')

    obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
    print(obj['Body'])

    df = pd.read_excel(obj, header=2)
    print(df.head(2))

I get an error as below:

Invalid file path or buffer object type: <type 'dict'>: ValueError
Traceback (most recent call last):
File "/var/task/handler.py", line 26, in main
df = pd.read_excel(obj, header=2)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/io/excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "/var/task/pandas/io/excel.py", line 376, in __init__
io, _, _, _ = get_filepath_or_buffer(self._io)
File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <type 'dict'>

How can I resolve this?

4 Answers 4

8

It is perfectly normal! obj is a dictionnary, have you tried this?

df = pd.read_excel(obj['body'], header=2)
Sign up to request clarification or add additional context in comments.

2 Comments

That was it. df = pd.read_excel(obj['body'], header=2). Your post is missing closing ] for 'body'. Thank you for the help.
My pleasure :) P.S: I have added the ]
5

try pd.read_excel(obj['Body'].read())

Comments

2

Pandas now supports s3 URL as a file path so it can read the excel file directly from s3 without downloading it first.

See here for a CSV example - https://stackoverflow.com/a/51777553/52954

Comments

0

If obj is a dictionary, you could try

df = pd.DataFrame.from_dict(obj)

Documentation here if you need to change params.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.