3

I'm trying to create a Pandas dataframe from a python nested dictionary that looks like this:

dictionary = {'user1' : {'a': np.array([1,2,3,4]),
                         'b': np.array([6,7,8,9])},

              'user2' : {'a': np.array([2,3,4,5]),
                         'b': np.array([7,8,9,1])}}

I'd like the data frame to look like this:

      a_w a_x a_y a_z b_w b_x b_y b_z
user1  1   2   3   4   6   7   8   9
user2  2   3   4   5   7   8   9   1

EDIT: (where w,x,y,z are markers that tell what the value in the array represent)

I've tried to modify the solution in these question: Nested dictionary to multiindex dataframe where dictionary keys are column labels

Construct pandas DataFrame from items in nested dictionary

but cannot get the correct form.

Any help would be great, thank you.

4
  • not sure why you would like to have dataframe with duplicated headers... Commented Jul 7, 2019 at 0:16
  • See updated reply. Commented Jul 7, 2019 at 8:58
  • Is there any specific reason to use numpy arrays? Is it allowed to use plain lists instead to answer your question? Commented Jul 7, 2019 at 9:37
  • @amanb Yes, the data are large 3d numpy arrays. Commented Jul 9, 2019 at 16:20

2 Answers 2

3

You can do the entire thing with a dictionary comprehension, and use enumerate to track the index of each element, giving you some semblance of ordering.

d = {
  k: {f'{ik}_{idx}': el for ik, iv in v.items() for idx, el in enumerate(iv)}
  for k, v in dictionary.items()
}

pd.DataFrame.from_dict(d, orient='index')

       a_0  a_1  a_2  a_3  b_0  b_1  b_2  b_3
user1    1    2    3    4    6    7    8    9
user2    2    3    4    5    7    8    9    1
Sign up to request clarification or add additional context in comments.

Comments

1

Having duplicated column names is rarely a good idea.. but here you go,

Update 2

result = pd.concat({key:pd.DataFrame(val,index=['w','x','y','z']) for key,val in dictionary.items()})
           .unstack(-1)

You know what, I'm gonna leave the multiindex in the column rather than having _ concatenation. It's often more flexible to leave it this way.

Update 1

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).droplevel(1,axis=1)

Original

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).T
            .reset_index(level=1,drop=True).T)

result
        a   a   a   a   b   b   b   b
user1   1   2   3   4   6   7   8   9
user2   2   3   4   5   7   8   9   1

3 Comments

Nice! Btw. you can avoid the transpose operations (which could be expencsive and can spoil your column types). You can do that by using result.columns.droplevel(1) instead of reset_index.
Thanks a lot, for the answer. Indeed, you're right about the column names. I made a typo and the column names should be indexed by one of 4 letters: a_w, a_x, a_y, a_z, b_w, b_x, b_y, b_z. I've updated the question. Is it an easy modification of your answer? Thanks again.
@jottbe haha correct! I totally forgot that! Actually, since 24, you could apply droplevel on dataframe and control axis. See modified answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.