0

Say I have the following dictionaries:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3],    'baz': []}

How can I create a multi-index DataFrame using these dictionaries?

It should be something like:

index_1  index_2     column_data_1
foo      A           2
         B           4
         C           5
bar      X           2
         Y           3
baz      np.NaN      np.NaN 

Note:

If NaN indices are not supported by Pandas, we can drop the empty entries in the dictionaries above.

Ideally, I would like the DataFrame to capture somehow the fact that those entries are missing if possible. However, the most important thing is being able to index the dataframe using the indices in multilevel_indices.

2 Answers 2

2

use concat:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())

Results in:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64

Also, as @CT Zhu mentioned, in the definitions for baz, if you change [] to [None] you can keep track of those entries:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

The original dataset that you have may not result in nan index, but change it a little bit will do.

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.