10

I need to create a DataFrame that contains columns of DataFrames. The DataFrames that go in the column have different sizes and I am getting a StopIteration exception. This doesn't happen, when the DataFrames are of the same size. I know a Panel is more suitable for this, but I need a DataFrame in this case.

a=pd.DataFrame({'cat1':['one','two','three'],'cat2':['four','five','six']})
b=pd.DataFrame({'cat1':['ten','eleven'],'cat2':['twelve','thirteen']})
pd.DataFrame({'col1':{'row1':a,'row2':b}})

If I remove the 'three' and 'six' items from 'cat1', 'cat2' respectively, then this works fine. Any idea how I can achieve this?

2
  • I haven't seen a mention of a DataFrame of DataFrame's in pandas author's "Python for Data Analysis" book. What is your end goal please? Commented Jul 30, 2013 at 18:55
  • I have a list of securities going down and bunch of fields going across. Some of this fields result in a table (ie holders list or dividend history) and I wanted to combine this with scalar values (price, pct change, name etc). I already have a panel view, but wanted to have a single view of the entire table. This is merely with the inention to be able to generalize the approach within the code, ie i can always take DF.ix['security','field'] regardless of the field shape. I guess the only right way is to do this with a panel[security][field]. I was just trying my luck for generalization. Commented Jul 30, 2013 at 21:14

1 Answer 1

7

this is not a good idea, you lose all efficiency because things are treated as object dtype and operations will be quite slow (as operations cannot be done via c-level base types, like float/int). Better is to use a multi-level index, which can easily encompass what I think you want

In [20]: a
Out[20]: 
    cat1  cat2
0    one  four
1    two  five
2  three   six

In [21]: b
Out[21]: 
     cat1      cat2
0     ten    twelve
1  eleven  thirteen

In [22]: pd.concat([ a, b ], keys={ 'row1' : a, 'row2' : b })
Out[22]: 
          cat1      cat2
row1 0     one      four
     1     two      five
     2   three       six
row2 0     ten    twelve
     1  eleven  thirteen
Sign up to request clarification or add additional context in comments.

3 Comments

There's also the option to create a hierarchically indexed DataFrame using Panel.to_frame(filter_observations=False).
Thank you Jeff. The idea of doing this is because I need to combine these data frames with another bunch of scalar values. For example row1: DF_a, np.nan, 104, 105 | row2: np.nan, DF_b, 234, 213. This assuming i have the columns Cat1, Cat2, Scalar1, Scalar2. i guess this still possible using the multi-index approach, would i just need to broadcast the scalar value across all items of cat1/cat2? thanks again
you don't need to be that fancy df['scalar1'] = 234 will work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.