How can I improve the runtime of reading multiple excel files using python?

Question

I created function that iterates over a folder containing excel files and creates a list of all the headers across all sheets. It works fine but is VERY slow. Do you have any ideas on how to improve it? THANKS!

import glob

# file directory
path = r'C:\Users\John\Excel_folder' 
all_files = glob.glob(path + "/*.xlsx")

def get_columns(file):    
    sheets = pd.ExcelFile(file).sheet_names
    for sheet in sheets:
        for i in (list(pd.read_excel(file, sheet, nrows=0).columns)):
                  col.append(i)
col=[]
for i in all_files:
    get_columns(i)

col

Ben.T · Accepted Answer · 2020-04-30 03:16:10Z

1

you can pass None to sheet_name in read_excel to read all sheets at once. It creates a dictionary of dataframe, so at the end you can do with list comprehension.

def get_columns(file):
    return [c 
            for df in pd.read_excel(file, 
                                    sheet_name=None, 
                                    nrows=0).values() 
            for c in df.columns]

col = [c for file in all_files for c in get_columns(file)]

it should be faster because you open once the file instead of many times.

answered Apr 30, 2020 at 3:16

Ben.T

29.7k6 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Almog Woldenberg Over a year ago

Thanks Ben! I never came across a collections.OrderedDict class. how do I access the different elements of it?

Ben.T Over a year ago

@AlmogWoldenberg I'm not sure what you mean, pd.read_excel with sheet_name=None return a regular dict for me, but otherwise OrderedDict can be used like "regular" dict, items(), key(), values() like in the code above, one difference would be to keep the keys in the order they are added into the dict, but other than that I don't know enough about it

Collectives™ on Stack Overflow

How can I improve the runtime of reading multiple excel files using python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related