Pandas: DataFrame within DataFrame

Question

I need to create a DataFrame that contains columns of DataFrames. The DataFrames that go in the column have different sizes and I am getting a StopIteration exception. This doesn't happen, when the DataFrames are of the same size. I know a Panel is more suitable for this, but I need a DataFrame in this case.

a=pd.DataFrame({'cat1':['one','two','three'],'cat2':['four','five','six']})
b=pd.DataFrame({'cat1':['ten','eleven'],'cat2':['twelve','thirteen']})
pd.DataFrame({'col1':{'row1':a,'row2':b}})

If I remove the 'three' and 'six' items from 'cat1', 'cat2' respectively, then this works fine. Any idea how I can achieve this?

I haven't seen a mention of a DataFrame of DataFrame's in pandas author's "Python for Data Analysis" book. What is your end goal please? — Maxim Egorushkin
– Maxim Egorushkin, Commented Jul 30, 2013 at 18:55
I have a list of securities going down and bunch of fields going across. Some of this fields result in a table (ie holders list or dividend history) and I wanted to combine this with scalar values (price, pct change, name etc). I already have a panel view, but wanted to have a single view of the entire table. This is merely with the inention to be able to generalize the approach within the code, ie i can always take DF.ix['security','field'] regardless of the field shape. I guess the only right way is to do this with a panel[security][field]. I was just trying my luck for generalization. — jorge.santos
– jorge.santos, Commented Jul 30, 2013 at 21:14

Jeff · Accepted Answer · 2013-07-30 19:32:24Z

7

this is not a good idea, you lose all efficiency because things are treated as object dtype and operations will be quite slow (as operations cannot be done via c-level base types, like float/int). Better is to use a multi-level index, which can easily encompass what I think you want

In [20]: a
Out[20]: 
    cat1  cat2
0    one  four
1    two  five
2  three   six

In [21]: b
Out[21]: 
     cat1      cat2
0     ten    twelve
1  eleven  thirteen

In [22]: pd.concat([ a, b ], keys={ 'row1' : a, 'row2' : b })
Out[22]: 
          cat1      cat2
row1 0     one      four
     1     two      five
     2   three       six
row2 0     ten    twelve
     1  eleven  thirteen

edited Jul 30, 2013 at 19:32

answered Jul 30, 2013 at 19:17

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Phillip Cloud Over a year ago

There's also the option to create a hierarchically indexed DataFrame using Panel.to_frame(filter_observations=False).

jorge.santos Over a year ago

Thank you Jeff. The idea of doing this is because I need to combine these data frames with another bunch of scalar values. For example row1: DF_a, np.nan, 104, 105 | row2: np.nan, DF_b, 234, 213. This assuming i have the columns Cat1, Cat2, Scalar1, Scalar2. i guess this still possible using the multi-index approach, would i just need to broadcast the scalar value across all items of cat1/cat2? thanks again

Jeff Over a year ago

you don't need to be that fancy df['scalar1'] = 234 will work

Collectives™ on Stack Overflow

Pandas: DataFrame within DataFrame

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related