Pandas Dataframe from Python nested dictionary

Question

I'm trying to create a Pandas dataframe from a python nested dictionary that looks like this:

dictionary = {'user1' : {'a': np.array([1,2,3,4]),
                         'b': np.array([6,7,8,9])},

              'user2' : {'a': np.array([2,3,4,5]),
                         'b': np.array([7,8,9,1])}}

I'd like the data frame to look like this:

      a_w a_x a_y a_z b_w b_x b_y b_z
user1  1   2   3   4   6   7   8   9
user2  2   3   4   5   7   8   9   1

EDIT: (where w,x,y,z are markers that tell what the value in the array represent)

I've tried to modify the solution in these question: Nested dictionary to multiindex dataframe where dictionary keys are column labels

Construct pandas DataFrame from items in nested dictionary

but cannot get the correct form.

Any help would be great, thank you.

not sure why you would like to have dataframe with duplicated headers... — Mark Wang
– Mark Wang, Commented Jul 7, 2019 at 0:16
Is there any specific reason to use numpy arrays? Is it allowed to use plain lists instead to answer your question? — amanb
– amanb, Commented Jul 7, 2019 at 9:37

user3483203 · Accepted Answer · 2019-07-07 01:57:24Z

3

You can do the entire thing with a dictionary comprehension, and use enumerate to track the index of each element, giving you some semblance of ordering.

d = {
  k: {f'{ik}_{idx}': el for ik, iv in v.items() for idx, el in enumerate(iv)}
  for k, v in dictionary.items()
}

pd.DataFrame.from_dict(d, orient='index')

       a_0  a_1  a_2  a_3  b_0  b_1  b_2  b_3
user1    1    2    3    4    6    7    8    9
user2    2    3    4    5    7    8    9    1

answered Jul 7, 2019 at 1:57

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark Wang · Accepted Answer · 2019-07-07 08:57:55Z

1

Having duplicated column names is rarely a good idea.. but here you go,

Update 2

result = pd.concat({key:pd.DataFrame(val,index=['w','x','y','z']) for key,val in dictionary.items()})
           .unstack(-1)

You know what, I'm gonna leave the multiindex in the column rather than having _ concatenation. It's often more flexible to leave it this way.

Update 1

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).droplevel(1,axis=1)

Original

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).T
            .reset_index(level=1,drop=True).T)

result
        a   a   a   a   b   b   b   b
user1   1   2   3   4   6   7   8   9
user2   2   3   4   5   7   8   9   1

edited Jul 7, 2019 at 8:57

answered Jul 7, 2019 at 0:21

Mark Wang

2,7579 silver badges18 bronze badges

3 Comments

jottbe Over a year ago

Nice! Btw. you can avoid the transpose operations (which could be expencsive and can spoil your column types). You can do that by using result.columns.droplevel(1) instead of reset_index.

sk1995 Over a year ago

Thanks a lot, for the answer. Indeed, you're right about the column names. I made a typo and the column names should be indexed by one of 4 letters: a_w, a_x, a_y, a_z, b_w, b_x, b_y, b_z. I've updated the question. Is it an easy modification of your answer? Thanks again.

Mark Wang Over a year ago

@jottbe haha correct! I totally forgot that! Actually, since 24, you could apply droplevel on dataframe and control axis. See modified answer.

Collectives™ on Stack Overflow

Pandas Dataframe from Python nested dictionary

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related