Pandas DataFrame inside DataFrames

Question

I'm trying to train a multiobject classificator. For that, I have my dataset info stored into a pandas DataFrame which currently looks like this:

|IMAGE_PATHS---|LABELS------------------------------------------|
|path_to_image1|[[c11 x11 y11 w11 h11],[c12 x12 y12 w12 h12]...]| 
|path_to_image2|[[c21 x21 y21 w21 h21],[c22 x22 y22 w22 h22]...]| 
|...

But having it this way it's not easy to play with it. For example, if I want to see all unicorns labeled in my images, I would need to iterate over all elements in each row and look for them. If this labels were DataFrames, I could easily filter them as df[df["label"] == "unicorn"].

So is there a way to easily create a DataFrame inside this DataFrame or some other cool trick out there?

DataFrames aren't really mean to store complex objects. Like you've realized you lose a lot of the inherent functionality, even with something as simple as a list. Storing a DataFrame inside of a DataFrame doesn't really fix this. In this case you should probably consider reshaping your data, perhaps in a long format with a MultiIndex — ALollz
– ALollz, Commented Sep 5, 2019 at 20:57

Derek Eden · Accepted Answer · 2019-09-05 22:11:53Z

1

if your labels are just nested lists, you can do this:

df[df['LABELS'].apply(lambda x: 'unicorn' in [item for sublist in x for item in sublist])]

this flattens the sublists into a single list within the lambda function, then checks if it contains 'unicorn', masks the df and finally returns the filtered df

answered Sep 5, 2019 at 22:11

Derek Eden

4,6585 gold badges22 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas DataFrame inside DataFrames

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related