6

I have a dict where the values are is a list, for example;

my_dict = {1: [964725688, 6928857],
           ...

           22: [1667906, 35207807, 685530997, 35207807],
           ...
           }

In this example, the max items in a list is 4, but it could be greater than that.

I would like to convert it to a dataframe like:

1  964725688
1  6928857
...
22 1667906
22 35207807
22 685530997
22 35207807
2

4 Answers 4

3
my_dict ={1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame( [ [k,ele] for k,v in my_dict.iteritems() for ele in v ])

print df

   0   1        
0   1  964725688
1   1    6928857
2  22    1667906
3  22   35207807
4  22  685530997
5  22   35207807
Sign up to request clarification or add additional context in comments.

1 Comment

This is a nice solution!
2

First Idea
pandas

s = pd.Series(my_dict)
pd.Series(
    np.concatenate(s.values),
    s.index.repeat(s.str.len())
)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Faster!
numpy

values = list(my_dict.values())
lens = [len(value) for value in values]
keys = list(my_dict.keys())
pd.Series(np.concatenate(values), np.repeat(keys, lens))

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Interesting
pd.concat

pd.concat({k: pd.Series(v) for k, v in my_dict.items()}).reset_index(1, drop=True)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Comments

1
#Load dict directly to a Dataframe without loops
df=pd.DataFrame.from_dict(my_dict,orient='index')

#Unstack, drop na and sort if you need.
df.unstack().dropna().sort_index(level=1)
Out[382]: 
0  1     964725688.0
1  1       6928857.0
0  22      1667906.0
1  22     35207807.0
2  22    685530997.0
3  22     35207807.0
dtype: float64

1 Comment

My solution is similar to this.
1

Slightly on the functional side using zip and reduce:

from functools import reduce  # if working with Python3
import pandas as pd


d = {1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame(reduce(lambda x,y: x+y, [list(zip([k]*len(v), v)) for k,v in d.items()]))

print(df)

#     0          1
# 0   1  964725688
# 1   1    6928857
# 2  22    1667906
# 3  22   35207807
# 4  22  685530997
# 5  22   35207807

We zip the keys and the values to create records (extended through a reduce operation). The records are then passed to the pd.DataFrame function.

I hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.