Python - Dictionary from CSV file with Multiple Values per Key

Question

I am trying to make a dictionary from a csv file in python. Let's say the CSV contains:

Student   food      amount
John      apple       15
John      banana      20
John      orange      1
John      grape       3
Ben       apple       2
Ben       orange      4
Ben       strawberry  8
Andrew    apple       10
Andrew    watermelon  3

what i'm envisioning is a dictionary whose key will be the student name and a list as the value where each entry corresponds to a different food. I would have to count the number of unique food items in the second column and that would be the length of the vector. For example:

The value of [15,20,1,3,0,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for  'John'. 
The value of [2,0,4,0,8,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Ben'.
The value of [10,0,0,0,0,3] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Andrew'

The expected output of the dict would look like this:

dict={'John':{[15,20,1,3,0,0]}, 'Ben': {[2,0,4,0,8,0]}, 'Andrew': {[10,0,0,0,0,3]}}

I'm having trouble creating the dictionary to begin with or if a dictionary is even the right approach. What I have to begin with:

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data[row['Student']]=row
data_file.close()

thanks for taking the time to read. any help would be greatly appreciated.

XrXr · Accepted Answer · 2014-02-19 23:24:51Z

3

Here is a version using regular dictionary. Defaultdict is definitely better though.

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    if row['Student'] in data:
        data[row['Student']].append(row['amount'])
    else:
        data[row['Student']] = [row['amount']]
data_file.close()

EDIT:

For matching indicies
import csv
from collections import defaultdict

data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data=defaultdict(lambda:[0,0,0,0])
fruit_to_index = defaultdict(lambda:None,{'apple':0,'banana':1,'orange':2,'grape':3})
for row in reader:
    if fruit_to_index[row['food']] != None:
        data[row['Student']][fruit_to_index[row['food']]] = int(row['amount'])
data_file.close()

print data would be

defaultdict(<function <lambda> at address>, 
{'John':  [15, 20, 1, 3], 
'Ben':    [2 , 0 , 0, 0], 
'Andrew': [10, 0 , 0, 0]})

I think this is what you want.

EDIT2: Did this when the list of fruits didn't include strawberry and watermelon, but should be very easy to add. If the list is too large

to generate the fruit to index mapping

set_of_fruits = set()
for row in reader:
    set_of_fruits.add(row['food'])
c = 0
for e in set_of_fruits:
    fruit_to_index[e] = c
    c += 1

Note that the order of set_of_fruits is not generated.

data = defaultdict(lambda:[0,0,0,0]) becomes

data = defaultdict(lambda:[0 for x in range(len(set_of_fruits))])

edited Feb 19, 2014 at 23:24

answered Feb 19, 2014 at 22:09

XrXr

2,0671 gold badge14 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3330107 Over a year ago

Thanks. However, this only adds to the list but does not match indices to the food names. For example since Ben did not eat an orange, the amount would be populated with a 0.

user3330107 Over a year ago

i want to try to avoid hardcoding the index of each fruit because unfortunately, there are ~200 unique fruits in my csv file.

XrXr Over a year ago

Read edit 2. You can just do use row['food'] to generate a list of fruits

piokuc · Accepted Answer · 2014-02-19 22:12:13Z

1

Try this, I think this what you want. Notice the usage of defaultdict, it could be done with a regular dictionary but defaultdict is very handy in such cases:

import csv
from collections import defaultdict
data=defaultdict(list)
with open('data.csv','rb') as data_file:
    reader=csv.DictReader(data_file)
    for row in reader:
        data[row['Student']].append(row['amount'])

edited Feb 19, 2014 at 22:12

answered Feb 19, 2014 at 22:05

piokuc

26.3k11 gold badges76 silver badges105 bronze badges

2 Comments

user3330107 Over a year ago

Thanks. This only adds to the list but does not match indices to the food names.

piokuc Over a year ago

That's because you were not very precise describing your problem. Please correct the example expected output.

tzaman · Accepted Answer · 2014-02-19 22:15:20Z

0

You probably actually want a nested dictionary structure; keeping a list and then trying to match indices to food names will get hairy fast.

import csv
from collections import defaultdict
data = defaultdict(dict)
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        data[row['Student']][row['food']] = row['amount']

This will give you a structure like so:

{'John': {'apple': 15, 'banana': 20, 'orange': 1}, 
 'Ben': {'apple': 2, 'watermelon': 4}, #etc.
}

That lets you look up particular foods without having to try to cross-reference another list to figure out where to find the counts, and supports any number of food items without having to fill your lists with zeros for all the missing ones.

If you want to be extra-fancy, you can use a nested defaultdict, so that looking up foods that didn't get entered will return zeros automatically, instead of giving KeyErrors; just change the second line to:

data = defaultdict(lambda: defaultdict(int))

answered Feb 19, 2014 at 22:15

tzaman

48k11 gold badges93 silver badges118 bronze badges

1 Comment

user3330107 Over a year ago

Thanks. I guess I should mention what the end goal is. I'm trying to do a cosine similarity of the vector of amounts between various students so all I need to ensure is that the indices to the food names match for each student and if they don't have that food name, then the amount would be populated with a 0

nakedfanatic · Accepted Answer · 2014-02-19 22:17:44Z

0

Use the setdefault method of the dict.

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data.setdefault(row['Student'], []).append(row['amount'])
data_file.close()

If the key, eg. "John", doesn't exist, it creates it with the supplied default value. In this case an empty list is the default.

answered Feb 19, 2014 at 22:17

nakedfanatic

3,2082 gold badges31 silver badges34 bronze badges

Collectives™ on Stack Overflow

Python - Dictionary from CSV file with Multiple Values per Key

4 Answers 4

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related