1

From Excel, how can I parse 'Sheet1' and 'Sheet2' in to a list? I'm currently using xlrd, as shown in the code below.

Sheet1:

enter image description here

Sheet2:

enter image description here

My code:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from __future__ import print_function
import xlrd
import sys

loc = 'excel.xlsx'
wb = xlrd.open_workbook(loc, encoding_override="iso-8859-5, cyrillic")
wb_name = wb.sheet_names()
count = len(wb_name)
data = []
column_excel =('Name', 'Course', 'Cost', 'level')
count_column = len(column_excel)
for i in range(count):
    ow = xlrd.open_workbook('excel.xlsx').sheet_by_index(i)
    for x in range (0, 100):
        for i in range(2):
            try:
                if ow.cell_value(0, x) == column_excel[i]:
                    ips = ow.col_values(x, 1)
                    data.append(ips)
                    break
            except IndexError:
                continue
print(data)

My results:

[['Andre'], [1], [200], [5],
['Sam'], [2], [100], [8],
[7], ['Antony'], [4], [150],
[9], ['Ben'], [3], [500]]

Expected output:

[['Andre'], [1], [200], [5],
['Sam'], [2], [100], [8],
['Antony'], [4], [7], [150],
['Ben'], [3], [9], [500]]

1 Answer 1

2

If you'd like to use pandas to read the XLSX file, rather than xlrd, things become much simpler, from a coding perspective. Additionally, as the .append() function is quite clever in its design, the columns are auto-aligned (providing the column name is the same) - which can be helpful since the sheets have different column order.

The official pandas read_excel docs can be found here.

Sample code:
When calling the .read_excel() using multiple sheets, a dict of DataFrames (df_) is returned. The second line of code is used to combine the DataFrames.

import pandas as pd

df_ = pd.read_excel('courses.xlsx', sheet_name=['Sheet1', 'Sheet2'])
df = pd.DataFrame().append([df_[i] for i in df_]).reset_index(drop=True)

Output (as a DataFrame):

      Name  Course  Cost  level
0    Andre       1   200      5
1      Sam       2   100      8
2  Anthony       4   150      7
3      Ben       3   500      9

Output (as a list):

>>> df.to_numpy().tolist()

[['Andre', 1, 200, 5],
 ['Sam', 2, 100, 8],
 ['Anthony', 4, 150, 7],
 ['Ben', 3, 500, 9]]

Acknowledgement:
This list output is not identical (in format) to the expected output in the question. I presume this is a design flaw, as a the output of this answer provides a list of records rather than a list of individual fields - which may later become difficult to manage.

Sign up to request clarification or add additional context in comments.

2 Comments

Additional information: If you are using multiple excel files, you can read them in in a loop and append the resulting dataframe to a list. This list you can later unite into a big dataframe with big_df = pd.concat(list_variable)
@Dustin - Yes, absolutely correct, thank you for the additional information.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.