Read excel file from S3 into Pandas DataFrame

Question

I have a SNS notification setup that triggers a Lambda function when a .xlsx file is uploaded to S3 bucket.

The lambda function reads the .xlsx file into Pandas DataFrame.

import os 
import pandas as pd
import json
import xlrd
import boto3

def main(event, context):
    message = event['Records'][0]['Sns']['Message']
    parsed_message = json.loads(message)
    src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
    filepath = parsed_message['Records'][0]['s3']['object']['key']

    s3 = boto3.resource('s3')
    s3_client = boto3.client('s3')

    obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
    print(obj['Body'])

    df = pd.read_excel(obj, header=2)
    print(df.head(2))

I get an error as below:

Invalid file path or buffer object type: <type 'dict'>: ValueError
Traceback (most recent call last):
File "/var/task/handler.py", line 26, in main
df = pd.read_excel(obj, header=2)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/io/excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "/var/task/pandas/io/excel.py", line 376, in __init__
io, _, _, _ = get_filepath_or_buffer(self._io)
File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <type 'dict'>

How can I resolve this?

tgrandje · Accepted Answer · 2024-07-12 15:21:33Z

8

It is perfectly normal! obj is a dictionnary, have you tried this?

df = pd.read_excel(obj['body'], header=2)

edited Jul 12, 2024 at 15:21

tgrandje

2,5623 gold badges20 silver badges39 bronze badges

answered Jan 14, 2019 at 16:36

Tarik Elkalai

1292 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Raj Over a year ago

That was it. df = pd.read_excel(obj['body'], header=2). Your post is missing closing ] for 'body'. Thank you for the help.

Tarik Elkalai Over a year ago

My pleasure :) P.S: I have added the ]

Ritman Cronestar · Accepted Answer · 2021-12-09 21:56:44Z

5

try pd.read_excel(obj['Body'].read())

answered Dec 9, 2021 at 21:56

Ritman Cronestar

611 silver badge2 bronze badges

Comments

LiorH · Accepted Answer · 2020-02-26 13:24:46Z

2

Pandas now supports s3 URL as a file path so it can read the excel file directly from s3 without downloading it first.

See here for a CSV example - https://stackoverflow.com/a/51777553/52954

answered Feb 26, 2020 at 13:24

LiorH

18.9k19 gold badges72 silver badges100 bronze badges

Comments

ycx · Accepted Answer · 2019-01-14 16:37:29Z

0

If obj is a dictionary, you could try

df = pd.DataFrame.from_dict(obj)

Documentation here if you need to change params.

answered Jan 14, 2019 at 16:37

ycx

3,2193 gold badges16 silver badges26 bronze badges

Collectives™ on Stack Overflow

Read excel file from S3 into Pandas DataFrame

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related