Create a dataframe from a dict where values are variable-length lists

Question

I have a dict where the values are is a list, for example;

my_dict = {1: [964725688, 6928857],
           ...

           22: [1667906, 35207807, 685530997, 35207807],
           ...
           }

In this example, the max items in a list is 4, but it could be greater than that.

I would like to convert it to a dataframe like:

1  964725688
1  6928857
...
22 1667906
22 35207807
22 685530997
22 35207807

Slight difference. That question there was a fixed number of items in the list and in my case, there is a variable number of items in the list. — spitfiredd
– spitfiredd, Commented May 11, 2017 at 18:52
Possible duplicate of Converting a dictionary with lists for values into a dataframe — ivan_pozdeev
– ivan_pozdeev, Commented May 12, 2017 at 1:22

galaxyan · Accepted Answer · 2017-05-11 18:41:34Z

3

my_dict ={1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame( [ [k,ele] for k,v in my_dict.iteritems() for ele in v ])

print df

   0   1        
0   1  964725688
1   1    6928857
2  22    1667906
3  22   35207807
4  22  685530997
5  22   35207807

answered May 11, 2017 at 18:41

galaxyan

6,1593 gold badges23 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hjmnzs Over a year ago

This is a nice solution!

piRSquared · Accepted Answer · 2017-05-11 18:54:19Z

2

First Idea
pandas

s = pd.Series(my_dict)
pd.Series(
    np.concatenate(s.values),
    s.index.repeat(s.str.len())
)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Faster!
numpy

values = list(my_dict.values())
lens = [len(value) for value in values]
keys = list(my_dict.keys())
pd.Series(np.concatenate(values), np.repeat(keys, lens))

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Interesting
pd.concat

pd.concat({k: pd.Series(v) for k, v in my_dict.items()}).reset_index(1, drop=True)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

edited May 11, 2017 at 18:54

answered May 11, 2017 at 18:46

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Allen Qin · Accepted Answer · 2017-05-11 21:36:34Z

1

#Load dict directly to a Dataframe without loops
df=pd.DataFrame.from_dict(my_dict,orient='index')

#Unstack, drop na and sort if you need.
df.unstack().dropna().sort_index(level=1)
Out[382]: 
0  1     964725688.0
1  1       6928857.0
0  22      1667906.0
1  22     35207807.0
2  22    685530997.0
3  22     35207807.0
dtype: float64

answered May 11, 2017 at 21:36

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

1 Comment

spitfiredd Over a year ago

My solution is similar to this.

Abdou · Accepted Answer · 2017-05-11 19:13:28Z

1

Slightly on the functional side using zip and reduce:

from functools import reduce  # if working with Python3
import pandas as pd


d = {1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame(reduce(lambda x,y: x+y, [list(zip([k]*len(v), v)) for k,v in d.items()]))

print(df)

#     0          1
# 0   1  964725688
# 1   1    6928857
# 2  22    1667906
# 3  22   35207807
# 4  22  685530997
# 5  22   35207807

We zip the keys and the values to create records (extended through a reduce operation). The records are then passed to the pd.DataFrame function.

I hope this helps.

answered May 11, 2017 at 19:13

Abdou

13.3k4 gold badges44 silver badges42 bronze badges

Collectives™ on Stack Overflow

Create a dataframe from a dict where values are variable-length lists

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related