3

Total Python noob here, probably missing something obvious. I've searched everywhere and haven't found a solution yet, so I thought I'd ask for some help.

I'm trying to write a function that will build a nested dictionary from a large csv file. The input file is in the following format:

Product,Price,Cost,Brand,
blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,

etc...

The output dictionary I need would look like:

projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}

But obviously with many different brands containing different products. In the input file, the data is ordered alphabetically by brand name, but I know that it becomes unordered as soon as DictReader executes, so I definitely need a better way to handle the duplicates. The if statement as written is redundant and unnecessary.

Here's the non-working, useless code I have so far:

def build_dict(source_file):
  projects = {}
  headers = ['Product', 'Price', 'Cost', 'Brand']
  reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
  current_brand = 'None'
  for row in reader:
    if Brand != current_brand:
      current_brand = Brand
    projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
  return projects

source_file = 'merged.csv'
print build_dict(source_file)

I have of course imported the csv module at the top of the file.

What's the best way to do this? I feel like I'm way off course, but there is very little information available about creating nested dicts from a CSV, and the examples that are out there are highly specific and tend not to go into detail about why the solution actually works, so as someone new to Python, it's a little hard to draw conclusions.

Also, the input csv file doesn't normally have headers, but for the sake of trying to get a working version of this function, I manually inserted a header row. Ideally, there would be some code that assigns the headers.

Any help/direction/recommendation is much appreciated, thanks!

6
  • You have two separate problems: the first one is parsing, and the second - putting the data into the dictionary. I recommend that you do it without the csv.DictReader first - it's just confusing, and very easy to implement with the native split function. Next - you should be appending to the dictionary, not overwriting it (when you get another item of an existing brand). Commented Aug 31, 2012 at 18:41
  • I agree with @GuyAdini, you don't even need to use the csv reader. Just do it manually, it will give you more options as well. Commented Aug 31, 2012 at 18:45
  • 2
    Using split instead of csv is very tempting, but will only work if you know that (say) Product is never "SuperUltraWidget, Canadian version". All things being equal, I think it's safer to learn to use the csv module now. Commented Aug 31, 2012 at 18:50
  • Even using the csv library you will have to write code to catch the None and figure out how to fix it. Commented Sep 2, 2012 at 16:54
  • @GuyAdini - Thanks for the suggestion. Just to be sure I'm understanding correctly, you think it would be better to open the file, iterate through each line splitting on (','), and updating the dict based on indices? Thanks for taking the time to try to help. Commented Sep 4, 2012 at 18:42

2 Answers 2

6
import csv
from collections import defaultdict

def build_dict(source_file):
    projects = defaultdict(dict)
    headers = ['Product', 'Price', 'Cost', 'Brand']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
                                skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            brand = rowdict.pop("Brand")
            product = rowdict.pop("Product")
            projects[brand][product] = rowdict
    return dict(projects)

source_file = 'merged.csv'
print build_dict(source_file)

produces

{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
               'purple widget': {'Cost': '6', 'Price': '7'}},
 'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
          'red widget': {'Cost': '5', 'Price': '6'}}}

from your input data (where merged.csv doesn't have the headers, only the data.)

I used a defaultdict here, which is just like a dictionary but when you refer to a key that doesn't exist instead of raising an Exception it simply makes a default value, in this case a dict. Then I get out -- and remove -- Brand and Product, and store the remainder.

All that's left I think would be to turn the cost and price into numbers instead of strings.

[modified to use DictReader directly rather than reader]

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks DSM, this is really helpful. However I am actually getting a None: [''] within every Product value dict. Since I'm not familiar with defaultdict, I'm not sure what's causing this. This is definitely much closer to a solution than what I had though so thanks for that.
A defaultdict is just a dict where if the key isn't found, it makes one and gives it the vaule specified (in this case, an empty dict). Could you give an example of the data line which seems to generate a None? Probably I just missed something obvious, which I'm known to do. :-/
Sure thing, for every dict that contains the price and cost, it's also creating a third key: value pair with None. So for every Product, it returns: {None: [''], 'Price': '27', 'Cost': '17'}
Hmm, puzzling: I don't see that at all. Is your merged.csv exactly in the format you've given? No extra commas at the end or anything?
I was just posting a comment about that... it seems that the person who edited my question removed the sentence where I stated that the spaces were added for legibility. I've just edited the main question to reflect the exact format. Sorry for the confusion, I just noticed that my original had been edited.
|
1

Here I offer another way to satisfy your requirement(different from DSM) Firstly, this is my code:

import csv

new_dict={}
with open('merged.csv','rb')as csv_file:
    data=csv.DictReader(csv_file,delimiter=",")
    for row in data:
        dict_brand=new_dict.get(row['Brand'],dict())
        dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}
        new_dict[row['Brand']]=dict_brand
print new_dict

Briefly speaking, the main point to solve is to figure out what the key-value pairs are in your requirements. According to your requirement,it can be called as a 3-level-dict,here the key of first level is the value of Brand int the original dictionary, so I extract it from the original csv file as

dict_brand=new_dict.get(row['Brand'],dict())

which is going to judge if there exists the Brand value same as the original dict in our new dict, if yes, it just inserts, if no, it creates, then maybe the most complicated part is the second level or middle level, here you set the value of Product of original dict as the value of the new dict of key Brand, and the value of Product is also the key of the the third level dict which has Price and Cost of the original dict as the value,and here I extract them like:

dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}

and finally, what we need to do is just set the created 'middle dict' as the value of our new dict which has Brand as the key. Finally, the output is

{'sony': {'blue widget': {'Price': '5', 'Cost': '4'}, 
'red widget': {'Price': '6', 'Cost': '5'}}, 
'microsoft': {'purple widget': {'Price': '7', 'Cost': '6'}, 
'green widget': {'Price': '7', 'Cost': '5'}}}

That's that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.