2

I have a list of nested dictionaries that looks like this:

[{'posts': {'item_1': 1,
                            'item_2': 8,
                            'item_3': 105,
                            'item_4': 324,
                            'item_5': 313, }},
                 {'edits': {'item_1': 1,
                            'item_2': 8,
                            'item_3': 61,
                            'item_4': 178,
                            'item_5': 163}},
                 {'views': {'item_1': 2345,
                            'item_2': 330649,
                            'item_3': 12920402,
                            'item_4': 46199102,
                            'item_5': 43094955}}]

I would like to write it to an excel file in this format:

+--------+-------+-------+-----------+
|        | posts | edits |   views   |
+--------+-------+-------+-----------+
| item_1 |     1 |     1 |      2345 |
| item_2 |     8 |     8 |    330649 |
| item_3 |   105 |    61 |  12920402 |
| item_4 |   324 |   178 |  46199102 |
| item_5 |   313 |   163 | 430949955 |
+--------+-------+-------+-----------+

I am using the xlsxwriter library and trying the following and variations on the following without success:

for item in data:
    for col_name, data in item.iteritems():
        col += 1
        worksheet.write(row, col, col_name)
        for row_name, row_data in data.iteritems():
            col += 1
            worksheet.write(row, col, row_name)
            worksheet.write(row + 1, col, row_data)

I'm wondering if it makes sense to rework my nested dictionary object or is it possible to write to excel in it's current form?

When I say without much success i mean, that I can get it to write certain thigns to the excel file, like column names or row or the data, but I am unable to get it to write like pictured above. I'm not getting errors, I suspect i jsut don't know how to unpack this object properly to loop through it. In the code above, I am given a combination of row and column names on row 1 and all of the values on row 2.

My output for the code above is:

+--+-------+--------+--------+--------+--------+--------+-------+--------+--------+--------+--------+--------+-------+----------+----------+--------+----------+--------+
|  | posts | item_4 | item_5 | item_2 | item_3 | item_1 | edits | item_4 | item_5 | item_2 | item_3 | item_1 | views |  item_4  |  item_5  | item_2 |  item_3  | item_1 |
+--+-------+--------+--------+--------+--------+--------+-------+--------+--------+--------+--------+--------+-------+----------+----------+--------+----------+--------+
|  |       |    324 |    313 |      8 |    105 |      1 |       |    178 |    163 |      8 |     61 |      1 |       | 46199102 | 43094955 | 330649 | 12920402 |   2345 |
+--+-------+--------+--------+--------+--------+--------+-------+--------+--------+--------+--------+--------+-------+----------+----------+--------+----------+--------+
7
  • Can you be more specific than "without success"? See minimal reproducible example Commented Sep 8, 2015 at 17:21
  • @jonrsharpe updated for more clarity Commented Sep 8, 2015 at 17:26
  • But not by much. What output are you getting? Commented Sep 8, 2015 at 17:27
  • 1
    Have you considered processing the dictionary to look more like the output you want to start with (i.e. {'item1': {'posts': 8, 'edits': 1, ...}, ...})? Commented Sep 8, 2015 at 17:36
  • 1
    docs.python.org/2/tutorial/datastructures.html#dictionaries Commented Sep 8, 2015 at 17:38

3 Answers 3

2

As an alternative, this could be solved using csv as follows:

import csv
import itertools

nested = [
    {'posts': {'item_1': 1, 'item_2': 8, 'item_3': 105, 'item_4': 324, 'item_5': 313,}},
    {'edits': {'item_1': 1, 'item_2': 8, 'item_3': 61, 'item_4': 178, 'item_5': 163}},
    {'views': {'item_1': 2345, 'item_2': 330649, 'item_3': 12920402, 'item_4': 46199102, 'item_5': 43094955}}]

headings = [d.keys()[0] for d in nested]
entries = [sorted(nested[index][col].items()) for index, col in enumerate(headings)]    

with open('output.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['items'] + headings)

    for cols in itertools.izip_longest(*entries, fillvalue=['<n/a>']*len(entries[0])):
        csv_output.writerow([cols[0][0]] + [col[1] for col in cols])

This would give you output.csv as follows:

items,posts,edits,views
item_1,1,1,2345
item_2,8,8,330649
item_3,105,61,12920402
item_4,324,178,46199102
item_5,313,163,43094955
Sign up to request clarification or add additional context in comments.

7 Comments

thanks for this! it works almost perfectly, but is there any way to shift the headings to the right by one column? currently the headings overlap with the row names.
I am trying this on a larger data set where there are more rows than columns, and in this case some of the rows seem to be getting left out of the csv writing. Do you think this is a side effect of your code above?
Yes, the original version was hard coded to your 3 sample columns. I have updated it to build the columns automatically, so that should fix it, providing the overall structure remains the same.
In cases where I have more rows than columns it still seems to be leaving off columns. I think that the fact that the enumeration on the the keys does this. Is there another method to not have the row creation reliant on the number of columns?
I've not been able to reproduce it, can you give me a link to some sample data that causes it?
|
2

Presently you have a dict each of posts, edits, and views which are each keyed to your "items", seems redundant.

Alternatively, create a single dictionary keyed to your "items", and have the value of each item be a dictionary of posts, edits, views, like:

items = {}
items = {{'item_1': {'posts':1, 'edits':0, 'views':2345}
         {'item_2': {'posts':2, 'edits':8, 'views':330649}}

This way you can simply refer to items['item_2']['edits'] (which should yield 8) or items['item_1']['views'] (which should yield 2345), etc.

In your case, then something like:

# write the headers -- this could be refined
row = 0
worksheet.write(0, 1, 'posts')
worksheet.write(0, 2, 'edits')
worksheet.write(0, 3, 'views')

# write the data:
for itm in items:
    row += 1
    worksheet.write(row, 0, itm)
    for col, prop in enmumerate(items[itm]):
        worksheet.write(row, col+1, prop)

Comments

1
import pandas as pd

data = [{'posts': {'item_1': 1,
                   'item_2': 8,
                   'item_3': 105,
                   'item_4': 324,
                   'item_5': 313, }
         },
        {'edits': {'item_1': 1,
                   'item_2': 8,
                   'item_3': 61,
                   'item_4': 178,
                   'item_5': 163}
         },
        {'views': {'item_1': 2345,
                   'item_2': 330649,
                   'item_3': 12920402,
                   'item_4': 46199102,
                   'item_5': 43094955}
         }]

final_df = pd.DataFrame()

for id in range(0,len(data)):
    df = pd.DataFrame.from_dict(data[id])
    final_df = pd.concat([final_df, df], axis=1)

print (final_df)

final_df.to_excel('data.xlsx')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.