6

this is my xlsx file :

enter image description here

and i want to get change this data to a dict like this :

{
    0:{
       'a':1,
       'b':100,
       'c':2,
       'd':10
    },
    1:{
       'a':8,
       'b':480,
       'c':3,
       'd':14
    }
...
}

so did somebody know a python lib to do this , and start from the line 124, and end of the line 141 ,

thanks

3
  • Your first output dict has data from lines 124 and 125; your second has data from line 126 ... please edit your question. Please also confirm that the data columns that you want are B, C, E, and G. Commented Apr 2, 2011 at 4:43
  • xlrd (as of version 0.8.0) supports reading .xlsx files directly. (The "bolt-on" module referred to by John Machin in his answer was finally incorporated into the xlrd package.) Related: stackoverflow.com/questions/4371163/… Commented Mar 5, 2013 at 15:53
  • I think you mean d:12 for the first part; and how big is your file? Commented Feb 27, 2014 at 12:42

4 Answers 4

1

Options with xlrd:

(1) Your xlsx file doesn't look very large; save it as xls.

(2) Use xlrd plus the bolt-on beta-test module xlsxrd (find my e-mail address and ask for it); the combination will read data from xls and xlsx files seamlessly (same APIs; it examines the file contents to determine whether it's xls, xlsx, or an imposter).

In either case, something like the (untested) code below should do what you want:

from xlrd import open_workbook
from xlsxrd import open_workbook
# Choose one of the above

# These could be function args in real live code
column_map = {
    # The numbers are zero-relative column indexes
    'a': 1,
    'b': 2,
    'c': 4,
    'd': 6,
    }
first_row_index = 124 - 1
last_row_index = 141 - 1
file_path = 'your_file.xls'

# The action starts here
book = open_workbook(file_path)
sheet = book.sheet_by_index(0) # first worksheet
key0 = 0
result = {}
for row_index in xrange(first_row_index, last_row_index + 1):
    d = {}
    for key1, column_index in column_map.iteritems():
        d[key1] = sheet.cell_value(row_index, column_index)
    result[key0] = d
    key0 += 1
Sign up to request clarification or add additional context in comments.

Comments

1

Suppose you had the data like this:

a,b,c,d
1,2,3,4
2,3,4,5
...

One of many potential answers in 2014 is:

import pyexcel


r = pyexcel.SeriesReader("yourfile.xlsx")
# make a filter function
filter_func = lambda row_index: row_index < 124 or row_index > 141
# apply the filter on the reader
r.filter(pyexcel.filters.RowIndexFilter(filter_func))
# get the data
data = pyexcel.utils.to_records(r)
print data

Now the data is an array of dictionaries:

[{
   'a':1,
   'b':100,
   'c':2,
   'd':10
},
{
   'a':8,
   'b':480,
   'c':3,
   'd':14
}...
]

Documentation can be read here

Comments

0

Another option is openpyxl. I've been meaning to try it out, but haven't gotten around to it yet, so I can't say how good it is.

1 Comment

Since posting this answer, I've had a chance to try openpyxl. It's quite easy to use. I've managed to write out a fairly large spreadsheet - 20 tabs, each with 200 columns and 500 rows. It uses around 2GB of memory for that operation. It also has an optimized append-only writer that the author claims can write a spreadsheet of unlimited size, but I haven't had a reason to try it yet.
0

Here's a very very rough implementation using just the standard library.

def xlsx(fname):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
        if el.tag.endswith('}v'): # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            letter = el.attrib['r'] # AZ22
            while letter[-1].isdigit():
                letter = letter[:-1]
            row[letter] = value
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return dict(enumerate(rows))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.