2

I am trying to add a column with values from a dictionary. It will be easy to show you the dummy data.

df = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[5,2,2,1,3]})

dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}

Notice that not every id is in the dictionary and the values which are the lists. I want to find the row in the df that matches with the keys in the dictionary and add the list in one column. So the desired output will look like this:

output = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[5,2,2,1,3], 'new_column':[[5,8,6,3],[1,2],[],[1,2],[8,6,2]]})
2

3 Answers 3

2

Is this what you want?

df = df.set_index('id')
dictionary = {1:[5,8,6,3], 2:[1,2], 5:[8,6,2]}    
df['new_column'] = pd.Series(dictionary)

Note: The keys of the dictionary need to be the same type (int) as the index of the data frame.

>>> print(df)
    gender    new_column
id                      
1        0  [5, 8, 6, 3]
2        0        [1, 2]
3        1           NaN
4        1           NaN
5        1     [8, 6, 2]

Update:

A better solution if 'id' column contains duplicates (see comments below):

df['new_column'] = df['id'].map(dictionary)
Sign up to request clarification or add additional context in comments.

3 Comments

I updated this answer to set the index of df to the 'id' values before assigning the new data.
I noticed that in the data frame, the same id appears multiple time. would that be okay with your code?
I think it will work but an index should not have duplicates so in that case it is probably not advisable. A better option then is to use df.map as explained in this answer. Like this: df['new_column'] = df['id'].map(dictionary).
0
import pandas as pd

df = pd.DataFrame({'id':[1,2,3,4,5], 'gender':[0,0,1,1,1]})

dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}

then just create a list with the values you want and add them to your dataframe

newValues = [ dictionary.get(str(val),[]) for val in df['id'].values]

df['new_column'] = newValues


>>> print(df)
    gender    new_column
id                      
1        0  [5, 8, 6, 3]
2        0        [1, 2]
3        1            []
4        1            []
5        1     [8, 6, 2]

Comments

0

You can construct your column using special dictionaries that has a value [] by default.

from collections import defaultdict
default_dictionary = defaultdict(list)
id = [1,2,3,4,5]
dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}
for n in dictionary:
    default_dictionary[n] = dictionary[n]
new_column = [default_dictionary[str(n)] for n in id]

new_column is [[5, 8, 6, 3], [1, 2], [], [], [8, 6, 2]] now and you can pass it to your last argument of pd.DataFrame(...)

2 Comments

There is a bit nicer way to make a defaultdict from an existing dictionary: default_dictionary = defaultdict(list, **dictionary). At the end, it also makes sense to just assign the column, like Bill did in his answer
Oh this is much better, indeed. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.