How to convert a dataframe to a dictionary

Question

I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

How can I do this?

Friedrich · Accepted Answer · 2024-03-05 11:36:02Z

416

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))

edited Mar 5, 2024 at 11:36

Friedrich

5,52416 gold badges86 silver badges63 bronze badges

answered Aug 2, 2013 at 9:42

punchagan

5,8861 gold badge21 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

Solution: area_dict = dict(zip(lakes['id'], lakes['value']))

jaamarks Over a year ago

What if you wanted more than one column to be the in the dictionary values? I am thinking something like area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column))). How would you make this happen?

pnv Over a year ago

If the second argument has multiple values, this won't work.

Nir Over a year ago

Many times using the dataframe index as the dictionary key is useful: dict(zip(lakes.index, lakes.values))

joris · Accepted Answer · 2016-01-05 22:19:56Z

226

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

edited Jan 5, 2016 at 22:19

answered Sep 9, 2013 at 9:55

joris

140k37 gold badges258 silver badges207 bronze badges

3 Comments

dalloliogm Over a year ago

Note that this command will lose data if there redundant values in the ID columns:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])  >>> ptest.set_index('id')['value'].to_dict()

Ben Fulton Over a year ago

I have to say, there is nothing in that docs link that would have given me the answer to this question.

Charles Hasegawa Over a year ago

Yeah, looking for the simplest way to turn a name value table(panda) into the correct dictionary was not obvious in any way from documentation.

praful gupta · Accepted Answer · 2016-10-03 17:41:02Z

93

mydict = dict(zip(df.id, df.value))

answered Oct 3, 2016 at 17:41

praful gupta

1,0317 silver badges3 bronze badges

1 Comment

aLbAc Over a year ago

Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value))

DSM · Accepted Answer · 2014-06-23 16:08:36Z

68

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

answered Jun 23, 2014 at 16:08

DSM

355k67 gold badges606 silver badges504 bronze badges

2 Comments

dalloliogm Over a year ago

Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below.

DSM Over a year ago

@dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas.

Community · Accepted Answer · 2017-05-23 10:31:36Z

35

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

edited May 23, 2017 at 10:31

CommunityBot

11 silver badge

answered Jun 23, 2014 at 14:35

dalloliogm

8,9606 gold badges49 silver badges57 bronze badges

1 Comment

Midnighter Over a year ago

Excuse the formatting due to the lack of a block in comments:

mydict = defaultdict(list)\n    for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n    mydict[key].append(val)

TylerH · Accepted Answer · 2022-03-21 13:42:49Z

16

Here is what I think is the simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

Read more about records from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

edited Mar 21, 2022 at 13:42

TylerH

21.3k84 gold badges84 silver badges122 bronze badges

answered Mar 6, 2019 at 19:58

Gil Baggio

14.1k3 gold badges51 silver badges37 bronze badges

Comments

Dongwan Kim · Accepted Answer · 2018-09-14 07:06:08Z

14

You can use 'dict comprehension'

my_dict = {row[0]: row[1] for row in df.values}

answered Sep 14, 2018 at 7:06

Dongwan Kim

1411 silver badge3 bronze badges

1 Comment

tda Over a year ago

Looping with pandas isn't the most efficient in terms of memory usage. See: engineering.upside.com/…

Rahul Agarwal · Accepted Answer · 2018-09-25 16:03:22Z

13

With pandas it can be done as:

If lakes is your DataFrame:

area_dict = lakes.to_dict('records')

edited Sep 25, 2018 at 16:03

Rahul Agarwal

4,1168 gold badges33 silver badges56 bronze badges

answered Apr 17, 2018 at 7:55

AnandSin

3192 silver badges3 bronze badges

3 Comments

Michael D Over a year ago

there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to.

Zheng Liu Over a year ago

@MichaelD 'records' is not a column. It's an option for the argument orient.

Roei Bahumi Over a year ago

This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict.

Vincent Appiah · Accepted Answer · 2018-01-05 00:16:39Z

9

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word

answered Jan 5, 2018 at 0:16

Vincent Appiah

1011 silver badge1 bronze badge

1 Comment

Azurespot Over a year ago

Agree, it did not work for me. But how can you do df.id, the column name id is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something.

TylerH · Accepted Answer · 2022-03-21 13:44:34Z

6

Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).

I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B are used as an index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).

edited Mar 21, 2022 at 13:44

TylerH

21.3k84 gold badges84 silver badges122 bronze badges

answered Apr 28, 2020 at 12:22

Alexandre Dias

611 silver badge5 bronze badges

1 Comment

TylerH Over a year ago

What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast".

user1376377 · Accepted Answer · 2017-10-23 16:29:10Z

4

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

answered Oct 23, 2017 at 16:29

user1376377

1251 gold badge1 silver badge5 bronze badges

1 Comment

Adriaan Over a year ago

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.

Samlex · Accepted Answer · 2018-11-13 23:46:40Z

3

You can also do this if you want to play around with pandas. However, I like punchagan's way.

# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)

# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)

output: {10: 7, 20: 5, 30: 2, 40: 3}

answered Nov 13, 2018 at 23:46

Samlex

1914 silver badges8 bronze badges

Comments

Allan · Accepted Answer · 2021-05-03 02:00:47Z

3

If 'lakes' is your DataFrame, you can also do something like:

# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)

My solution:

area_dict = lakes.set_index("area")["count"].to_dict()

or @punchagan 's solution (which I prefer)

area_dict = dict(zip(lakes.area, lakes.count))

Both should work.

answered May 3, 2021 at 2:00

Allan

3712 silver badges12 bronze badges

1 Comment

Adriaan Over a year ago

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.

Heeda · Accepted Answer · 2022-01-14 01:27:11Z

2

you need this it

area_dict = lakes.to_dict(orient='records')

answered Jan 14, 2022 at 1:27

Heeda

291 bronze badge

2 Comments

Simas Joneliunas Over a year ago

Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem!

TylerH Over a year ago

This just repeats an existing answer by AnandSin from 2018.

Dmitry · Accepted Answer · 2018-01-18 00:07:57Z

1

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

answered Jan 18, 2018 at 0:07

Dmitry

513 bronze badges

Comments

ListenSoftware Louise Ai Agent · Accepted Answer · 2021-01-24 15:20:26Z

1

If you set the the index than the dictionary will result in unique key value pairs

encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()

answered Jan 24, 2021 at 15:20

ListenSoftware Louise Ai Agent

4,3432 gold badges31 silver badges39 bronze badges

Comments

cottontail · Accepted Answer · 2023-03-22 17:39:24Z

1

Many answers here use dict(zip(...)) syntax. It's also possible without zip.

mydict = dict(df.values)                        # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
# or for faster code, convert to a list
mydict = dict(df.values.tolist())               # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}

If one column is int and the other is float as in the OP, then cast to object dtype and call dict().

mydict = dict(df.astype('O').values)            # {0: 10.2, 1: 5.7, 2: 7.4}
mydict = dict(df.astype('O').values.tolist())   # {0: 10.2, 1: 5.7, 2: 7.4}

If the index is meant to be the keys, it's even simpler.

mydict = df['value'].to_dict()                  # {0: 10.2, 1: 5.7, 2: 7.4}

answered Mar 22, 2023 at 17:39

cottontail

25.7k25 gold badges187 silver badges178 bronze badges

Comments

hitmikey · Accepted Answer · 2023-04-17 16:31:54Z

1

Edit:

Same result could be reached by the following:

filter_list = df[df.Col.isin(criteria)][['Col1','Col2']].values.tolist()

Original Post:

I had a similar issue, where I was looking to filter a dataframe into a resulting list of lists.

This was my solution:

filter_df = df[df.Col.isin(criteria)][['Col1','Col2']]
filter_list = filter_df.to_dict(orient='tight')
filter_list = filter_list['data']

Result: list of lists

Source: pandas.DataFrame.to_dict

answered Apr 17, 2023 at 16:31

hitmikey

236 bronze badges

Comments

Anil Kumar · Accepted Answer · 2022-11-23 09:11:56Z

0

If there exists some duplicate values in the value columns and if we want to keep the duplicate values in the dictionary

below code could help

df = pd.DataFrame([['a',1],['a',2],['a',4],['b',3],['b',4],['c',5]], columns=['id', 'value'])

df.groupby('id')['value'].apply(list).to_dict()

output : {'a': [1, 2, 4], 'b': [3, 4], 'c': [5]}

edited Nov 23, 2022 at 9:11

answered Nov 23, 2022 at 9:04

Anil Kumar

4451 gold badge4 silver badges20 bronze badges

Comments

user21504369 · Accepted Answer · 2024-01-19 03:49:18Z

0

Here's a way to create the dict containing info of multiple rows. You first set the col we want to use as key as index, then transpose and turn the dataframe to dict. After the transpose, the key column became column names, and all other features became the value of each column.

df.set_index('key_col', inplace=True)
dct = df.T.to_dict()

edited Jan 19, 2024 at 3:49

answered Jan 19, 2024 at 3:47

user21504369

12 bronze badges

1 Comment

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Hamoon · Accepted Answer · 2020-04-04 11:39:22Z

-1

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)

answered Apr 4, 2020 at 11:39

Hamoon

1

1 Comment

Adriaan Over a year ago

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.

TylerH · Accepted Answer · 2022-03-21 13:47:00Z

-1

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

This is my solution; a basic loop.

edited Mar 21, 2022 at 13:47

TylerH

21.3k84 gold badges84 silver badges122 bronze badges

answered Mar 20, 2020 at 2:15

SummersKing

3291 silver badge12 bronze badges

1 Comment

Adriaan Over a year ago

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.

Collectives™ on Stack Overflow

How to convert a dataframe to a dictionary

22 Answers 22

4 Comments

3 Comments

1 Comment

2 Comments

1 Comment

Comments

1 Comment

3 Comments

1 Comment

1 Comment

1 Comment

Comments

My solution:

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

22 Answers 22

4 Comments

3 Comments

1 Comment

2 Comments

1 Comment

Comments

1 Comment

3 Comments

1 Comment

1 Comment

1 Comment

Comments

My solution:

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related