How to load data from an xlsx file using python

Question

this is my xlsx file :

enter image description here

and i want to get change this data to a dict like this :

{
    0:{
       'a':1,
       'b':100,
       'c':2,
       'd':10
    },
    1:{
       'a':8,
       'b':480,
       'c':3,
       'd':14
    }
...
}

so did somebody know a python lib to do this , and start from the line 124, and end of the line 141 ,

thanks

Your first output dict has data from lines 124 and 125; your second has data from line 126 ... please edit your question. Please also confirm that the data columns that you want are B, C, E, and G. — John Machin
– John Machin, Commented Apr 2, 2011 at 4:43
xlrd (as of version 0.8.0) supports reading .xlsx files directly. (The "bolt-on" module referred to by John Machin in his answer was finally incorporated into the xlrd package.) Related: stackoverflow.com/questions/4371163/… — John Y
– John Y, Commented Mar 5, 2013 at 15:53
I think you mean d:12 for the first part; and how big is your file? — Burhan Khalid
– Burhan Khalid, Commented Feb 27, 2014 at 12:42

John Machin · Accepted Answer · 2011-04-02 05:04:29Z

Options with xlrd:

(1) Your xlsx file doesn't look very large; save it as xls.

(2) Use xlrd plus the bolt-on beta-test module xlsxrd (find my e-mail address and ask for it); the combination will read data from xls and xlsx files seamlessly (same APIs; it examines the file contents to determine whether it's xls, xlsx, or an imposter).

In either case, something like the (untested) code below should do what you want:

from xlrd import open_workbook
from xlsxrd import open_workbook
# Choose one of the above

# These could be function args in real live code
column_map = {
    # The numbers are zero-relative column indexes
    'a': 1,
    'b': 2,
    'c': 4,
    'd': 6,
    }
first_row_index = 124 - 1
last_row_index = 141 - 1
file_path = 'your_file.xls'

# The action starts here
book = open_workbook(file_path)
sheet = book.sheet_by_index(0) # first worksheet
key0 = 0
result = {}
for row_index in xrange(first_row_index, last_row_index + 1):
    d = {}
    for key1, column_index in column_map.iteritems():
        d[key1] = sheet.cell_value(row_index, column_index)
    result[key0] = d
    key0 += 1

chfw · Accepted Answer · 2014-09-21 21:29:20Z

1

Suppose you had the data like this:

a,b,c,d
1,2,3,4
2,3,4,5
...

One of many potential answers in 2014 is:

import pyexcel


r = pyexcel.SeriesReader("yourfile.xlsx")
# make a filter function
filter_func = lambda row_index: row_index < 124 or row_index > 141
# apply the filter on the reader
r.filter(pyexcel.filters.RowIndexFilter(filter_func))
# get the data
data = pyexcel.utils.to_records(r)
print data

Now the data is an array of dictionaries:

[{
   'a':1,
   'b':100,
   'c':2,
   'd':10
},
{
   'a':8,
   'b':480,
   'c':3,
   'd':14
}...
]

Documentation can be read here

answered Sep 21, 2014 at 21:29

chfw

4,6122 gold badges32 silver badges32 bronze badges

Comments

John Machin · Accepted Answer · 2011-04-03 19:26:54Z

0

Another option is openpyxl. I've been meaning to try it out, but haven't gotten around to it yet, so I can't say how good it is.

edited Apr 3, 2011 at 19:26

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

answered Apr 3, 2011 at 9:54

joshayers

3,4494 gold badges25 silver badges19 bronze badges

1 Comment

joshayers Over a year ago

Since posting this answer, I've had a chance to try openpyxl. It's quite easy to use. I've managed to write out a fairly large spreadsheet - 20 tabs, each with 200 columns and 500 rows. It uses around 2GB of memory for that operation. It also has an optimized append-only writer that the author claims can write a spreadsheet of unlimited size, but I haven't had a reason to try it yet.

Collin Anderson · Accepted Answer · 2014-02-27 12:14:32Z

Here's a very very rough implementation using just the standard library.

def xlsx(fname):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
        if el.tag.endswith('}v'): # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            letter = el.attrib['r'] # AZ22
            while letter[-1].isdigit():
                letter = letter[:-1]
            row[letter] = value
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return dict(enumerate(rows))

Collectives™ on Stack Overflow

How to load data from an xlsx file using python

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related