1

I am reading information from a CSV file and I am using a nested dictionary to map out the repetitive information in the file. How do I go about creating a nested dictionary for this file for all rows of the file? An example of the data (not actual data but basically same concept)

State ,City/Region ,Questions ,Answers 
NY,Manhattan ,East/West Coast? ,East 
NY,Manhattan ,been there? ,yes
NY,Brooklyn ,East/West Coast? ,East 
NY,Brooklyn ,been there? ,yes
NY,Brooklyn ,Been to coney island? ,yes
NY,Queens ,East/West Coast? ,East 
NY,Queens ,been there? ,yes
NY ,Staten Island ,is island? ,yes
MA,Boston ,East/West Coast? ,East 
MA,Boston ,like it there? ,yes
MA,Pioneer Valley ,East/West Coast? ,East 
MA,Pioneer Valley ,city? ,no
MA,Pioneer Valley ,college town? ,yes
CA,Bay Area ,warm? ,yes
CA ,Bay Area ,East/West Coast? ,West 
CA ,SoCal ,north or south? ,south 
CA ,SoCal ,warm ,yes 

So essentially, the master dictionary has 3 keys: NY, MA, CA, each of them has a dictionary with City/Region as key, and each City/Region has the questions and answers.
So it would be a very nested dictionary but I can't figure out the syntax for this to do it for every row in the file.

I've tried opening the file, used a for loop to read through the lines and split the lines by ",". Something like this:

for line in my_file:
    line=line.split(",") 
    MasterDict[line[0]] = {line[1] : {} }
    MasterDict[line[0]][line[1]] = {line[2] : line[3]}
3
  • 5
    What have you tried so far? What about your code isn't working? Commented Jun 28, 2017 at 15:35
  • try to look at groupby function from itertools docs.python.org/2/library/itertools.html#itertools.groupby Commented Jun 28, 2017 at 15:46
  • I've tried opening the file, use a for loop to read through the lines and split the lines by ",". Something like this for line in my_file: line=line.split(",") MasterDict[line[0]] = {line[1] : {} } MasterDict[line[0]][line[1]] = {line[2] : line[3]} Commented Jun 28, 2017 at 15:54

3 Answers 3

1
import csv
from collections import defaultdict
from functools import partial

defaultdict_of_dict = partial(defaultdict, dict)
master = defaultdict(defaultdict_of_dict)

with open("data.txt", 'r') as f:
    csv_reader = csv.reader(f)
    next(csv_reader)  # Skip the first line
    for row in csv_reader:
        state, city, question, answer = [field.strip() for field in row]
        master[state][city][question] = answer


print(master['NY']['Queens'])
# {'been there?': 'yes', 'East/West Coast?': 'East'}
print(master['NY']['Queens']['been there?'])
# yes

You can read the CSV file with the csv module that will take care of the splitting.

The example data you gave is full of unneeded spaces. In case it is the same on your real data, we sanitize it with strip.

To avoid having to create the missing keys in your dictionaries, you can use a defaultdict. It creates on-the-fly the missing keys with a default value.

For example, you could do:

from collections import defaultdict
d = defaultdict(dict)

to create a defaultdict with empty dicts as default values for missing keys, and use it like this:

d["new_key"]["subkey"] = 5
print(d)
# defaultdict(<class 'dict'>, {'new_key': {'subkey': 5}})

There's one difficulty in your case: you want a nested dictionary, so we need a defaultdict of defaultdict of dict

The parameter we give to defaultdict must be a callable, so we can't write something like defaultdict(defaultdict(dict)), as defaultdict(dict) is a defaultdict, not a function. One way to accomplish that is to use functools.partial to create a defaultdict_of_dict function, that we can pass to the main defaultdict.

Sign up to request clarification or add additional context in comments.

Comments

0

I figured out how to get it to work.

import pprint 
MasterDict={}
    my_file.readline()
    for line in my_file:
        line=line.split(",")
        if line[0] not in MasterDict:
            MasterDict[line[0]] = {}
        if line[1]:
            if line[1] not in MasterDict[line[0]]:
                MasterDict[line[0]][line[1]] = []
            MasterDict[line[0]][line[1]].append((line[2], line[3]))
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(MasterDict)

Comments

0

You can try this slightly shorter version:

f = open(myfile).readlines()

f = [i.strip('\n').split(',') for i in f]

d = {i[0]:{i[1]:[]} for i in f[1:]}

for i in f[1:]:
    if i[1] not in d[i[0]]:
        d[i[0]][i[1]] = i[2:]
    else:
        d[i[0]][i[1]].extend(i[2:])

print d

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.