Python - Pandas - Extract ENTIRE excel file as string

Question

I'd like to read a lot of excel files using pandas (python). When importing the data, I want ALL my columns to be stored as strings.

The problem is that I don't know the number of columns or even their names (it changes every time). Would you have an easy solution for this problem?

What I tried to do:

converters = { i : str for i in range(0,99)}
df = pd.read_excel('example.xlsx', converters = converters)

But the Index gets out of range sometimes since the excel files are different.

Ideally I'd like to do:

df = pd.read_excel('example.xlsx', converters = ALL)

Nevertheless, I haven't found something that would help me doing something similar so far...

Thank you for your help.

can you share the error from using converters = { i : str for i in range(0,99)} — piRSquared
– piRSquared, Commented Jan 12, 2017 at 15:54
piRSquared, "Index is out of Range". Which makes sense since the excel file is different every time. Sometimes a file has 99 columns, sometimes it has 10 columns. If the dictionary has more element than columns the index will be out of range. — Jeremie
– Jeremie, Commented Jan 12, 2017 at 16:06

MaxU - stand with Ukraine · Accepted Answer · 2017-01-12 16:24:44Z

4

UPDATE: i think we can use the standard (for Pandas) xlrd module and then reuse for reading data from the Excel file

xl = pd.ExcelFile(fn)
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i : str for i in range(ncols)})

OLD answer:

I think you would have first to get number of columns:

from openpyxl import load_workbook

workbook = load_workbook(filename, use_iterators=True)
col_num = workbook.worksheets[0].max_column

converters = { i : str for i in range(col_num)}
...

edited Jan 12, 2017 at 16:24

answered Jan 12, 2017 at 15:52

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

piRSquared Over a year ago

When I try to upvote again... it just takes it way. Which isn't what I want. How do I upvote twice? This is my next meta question.

Jeremie Over a year ago

Thanks MaxU: it works for most cases but sometimes I have extra columns at the end of the file that are not part of the table I extract (I used skiprow to avoid them). So in your code col_num would be too high and the index would be out of range. A solution I found would be to use read_excel two times: the first time to get df.columns.max (after skipping the rows I don't need) and then the second time using converters = { i : str for i in range(df.columns.max)}. Nevertheless I would like to avoid reading the excel files two times....

piRSquared Over a year ago

@user7410504 if you want to avoid reading it multiple times, it really should be in a better format. This is the reason we use formats, so we can avoid doing inefficient things.

Jeremie Over a year ago

Yeah I agree. It's just that I'm dealing with a lot of heavy files (that I didn't create myself). Reformatting is a pain :)

Collectives™ on Stack Overflow

Python - Pandas - Extract ENTIRE excel file as string

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related