207

I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

How can I do this?

0

22 Answers 22

416

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))
Sign up to request clarification or add additional context in comments.

4 Comments

Solution: area_dict = dict(zip(lakes['id'], lakes['value']))
What if you wanted more than one column to be the in the dictionary values? I am thinking something like area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column))). How would you make this happen?
If the second argument has multiple values, this won't work.
Many times using the dataframe index as the dictionary key is useful: dict(zip(lakes.index, lakes.values))
226

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

3 Comments

Note that this command will lose data if there redundant values in the ID columns: >>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()
I have to say, there is nothing in that docs link that would have given me the answer to this question.
Yeah, looking for the simplest way to turn a name value table(panda) into the correct dictionary was not obvious in any way from documentation.
93
mydict = dict(zip(df.id, df.value))

1 Comment

Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value))
68

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

2 Comments

Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below.
@dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas.
35

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

1 Comment

Excuse the formatting due to the lack of a block in comments: mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)
16

Here is what I think is the simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

Read more about records from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

Comments

14

You can use 'dict comprehension'

my_dict = {row[0]: row[1] for row in df.values}

1 Comment

Looping with pandas isn't the most efficient in terms of memory usage. See: engineering.upside.com/…
13

With pandas it can be done as:

If lakes is your DataFrame:

area_dict = lakes.to_dict('records')

3 Comments

there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to.
@MichaelD 'records' is not a column. It's an option for the argument orient.
This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict.
9

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word

1 Comment

Agree, it did not work for me. But how can you do df.id, the column name id is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something.
6

Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).

I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B are used as an index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).

1 Comment

What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast".
4

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

1 Comment

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
3

You can also do this if you want to play around with pandas. However, I like punchagan's way.

# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)

# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)

output: {10: 7, 20: 5, 30: 2, 40: 3}

Comments

3

If 'lakes' is your DataFrame, you can also do something like:

# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)

My solution:

area_dict = lakes.set_index("area")["count"].to_dict()

or @punchagan 's solution (which I prefer)

area_dict = dict(zip(lakes.area, lakes.count))

Both should work.

1 Comment

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
2

you need this it

area_dict = lakes.to_dict(orient='records')

2 Comments

Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem!
This just repeats an existing answer by AnandSin from 2018.
1

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

Comments

1

If you set the the index than the dictionary will result in unique key value pairs

encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()

Comments

1

Many answers here use dict(zip(...)) syntax. It's also possible without zip.

mydict = dict(df.values)                        # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
# or for faster code, convert to a list
mydict = dict(df.values.tolist())               # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}

If one column is int and the other is float as in the OP, then cast to object dtype and call dict().

mydict = dict(df.astype('O').values)            # {0: 10.2, 1: 5.7, 2: 7.4}
mydict = dict(df.astype('O').values.tolist())   # {0: 10.2, 1: 5.7, 2: 7.4}

If the index is meant to be the keys, it's even simpler.

mydict = df['value'].to_dict()                  # {0: 10.2, 1: 5.7, 2: 7.4}

Comments

1

Edit:

Same result could be reached by the following:

filter_list = df[df.Col.isin(criteria)][['Col1','Col2']].values.tolist()

Original Post:

I had a similar issue, where I was looking to filter a dataframe into a resulting list of lists.

This was my solution:

filter_df = df[df.Col.isin(criteria)][['Col1','Col2']]
filter_list = filter_df.to_dict(orient='tight')
filter_list = filter_list['data']

Result: list of lists

Source: pandas.DataFrame.to_dict

Comments

0

If there exists some duplicate values in the value columns and if we want to keep the duplicate values in the dictionary

below code could help

df = pd.DataFrame([['a',1],['a',2],['a',4],['b',3],['b',4],['c',5]], columns=['id', 'value'])

df.groupby('id')['value'].apply(list).to_dict()

output : {'a': [1, 2, 4], 'b': [3, 4], 'c': [5]}

Comments

0

Here's a way to create the dict containing info of multiple rows. You first set the col we want to use as key as index, then transpose and turn the dataframe to dict. After the transpose, the key column became column names, and all other features became the value of each column.

df.set_index('key_col', inplace=True)
dct = df.T.to_dict()

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
-1

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)

1 Comment

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
-1
def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

This is my solution; a basic loop.

1 Comment

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.