Creating a nested dictionary from a csv file in Python

Question

I am reading information from a CSV file and I am using a nested dictionary to map out the repetitive information in the file. How do I go about creating a nested dictionary for this file for all rows of the file? An example of the data (not actual data but basically same concept)

State ,City/Region ,Questions ,Answers 
NY,Manhattan ,East/West Coast? ,East 
NY,Manhattan ,been there? ,yes
NY,Brooklyn ,East/West Coast? ,East 
NY,Brooklyn ,been there? ,yes
NY,Brooklyn ,Been to coney island? ,yes
NY,Queens ,East/West Coast? ,East 
NY,Queens ,been there? ,yes
NY ,Staten Island ,is island? ,yes
MA,Boston ,East/West Coast? ,East 
MA,Boston ,like it there? ,yes
MA,Pioneer Valley ,East/West Coast? ,East 
MA,Pioneer Valley ,city? ,no
MA,Pioneer Valley ,college town? ,yes
CA,Bay Area ,warm? ,yes
CA ,Bay Area ,East/West Coast? ,West 
CA ,SoCal ,north or south? ,south 
CA ,SoCal ,warm ,yes

So essentially, the master dictionary has 3 keys: NY, MA, CA, each of them has a dictionary with City/Region as key, and each City/Region has the questions and answers.
So it would be a very nested dictionary but I can't figure out the syntax for this to do it for every row in the file.

I've tried opening the file, used a for loop to read through the lines and split the lines by ",". Something like this:

for line in my_file:
    line=line.split(",") 
    MasterDict[line[0]] = {line[1] : {} }
    MasterDict[line[0]][line[1]] = {line[2] : line[3]}

What have you tried so far? What about your code isn't working? — Iain Dwyer
– Iain Dwyer, Commented Jun 28, 2017 at 15:35
try to look at groupby function from itertools docs.python.org/2/library/itertools.html#itertools.groupby — iurii_n
– iurii_n, Commented Jun 28, 2017 at 15:46
I've tried opening the file, use a for loop to read through the lines and split the lines by ",". Something like this for line in my_file: line=line.split(",") MasterDict[line[0]] = {line[1] : {} } MasterDict[line[0]][line[1]] = {line[2] : line[3]} — question610
– question610, Commented Jun 28, 2017 at 15:54

Thierry Lathuille · Accepted Answer · 2017-06-28 21:25:07Z

import csv
from collections import defaultdict
from functools import partial

defaultdict_of_dict = partial(defaultdict, dict)
master = defaultdict(defaultdict_of_dict)

with open("data.txt", 'r') as f:
    csv_reader = csv.reader(f)
    next(csv_reader)  # Skip the first line
    for row in csv_reader:
        state, city, question, answer = [field.strip() for field in row]
        master[state][city][question] = answer


print(master['NY']['Queens'])
# {'been there?': 'yes', 'East/West Coast?': 'East'}
print(master['NY']['Queens']['been there?'])
# yes

You can read the CSV file with the csv module that will take care of the splitting.

The example data you gave is full of unneeded spaces. In case it is the same on your real data, we sanitize it with strip.

To avoid having to create the missing keys in your dictionaries, you can use a defaultdict. It creates on-the-fly the missing keys with a default value.

For example, you could do:

from collections import defaultdict
d = defaultdict(dict)

to create a defaultdict with empty dicts as default values for missing keys, and use it like this:

d["new_key"]["subkey"] = 5
print(d)
# defaultdict(<class 'dict'>, {'new_key': {'subkey': 5}})

There's one difficulty in your case: you want a nested dictionary, so we need a defaultdict of defaultdict of dict

The parameter we give to defaultdict must be a callable, so we can't write something like defaultdict(defaultdict(dict)), as defaultdict(dict) is a defaultdict, not a function. One way to accomplish that is to use functools.partial to create a defaultdict_of_dict function, that we can pass to the main defaultdict.

question610 · Accepted Answer · 2017-06-28 19:57:26Z

0

I figured out how to get it to work.

import pprint 
MasterDict={}
    my_file.readline()
    for line in my_file:
        line=line.split(",")
        if line[0] not in MasterDict:
            MasterDict[line[0]] = {}
        if line[1]:
            if line[1] not in MasterDict[line[0]]:
                MasterDict[line[0]][line[1]] = []
            MasterDict[line[0]][line[1]].append((line[2], line[3]))
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(MasterDict)

answered Jun 28, 2017 at 19:57

question610

411 silver badge9 bronze badges

Comments

Ajax1234 · Accepted Answer · 2017-06-28 20:18:15Z

0

You can try this slightly shorter version:

f = open(myfile).readlines()

f = [i.strip('\n').split(',') for i in f]

d = {i[0]:{i[1]:[]} for i in f[1:]}

for i in f[1:]:
    if i[1] not in d[i[0]]:
        d[i[0]][i[1]] = i[2:]
    else:
        d[i[0]][i[1]].extend(i[2:])

print d

answered Jun 28, 2017 at 20:18

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Collectives™ on Stack Overflow

Creating a nested dictionary from a csv file in Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related