0

I'm trying to train a multiobject classificator. For that, I have my dataset info stored into a pandas DataFrame which currently looks like this:

|IMAGE_PATHS---|LABELS------------------------------------------|
|path_to_image1|[[c11 x11 y11 w11 h11],[c12 x12 y12 w12 h12]...]| 
|path_to_image2|[[c21 x21 y21 w21 h21],[c22 x22 y22 w22 h22]...]| 
|...

But having it this way it's not easy to play with it. For example, if I want to see all unicorns labeled in my images, I would need to iterate over all elements in each row and look for them. If this labels were DataFrames, I could easily filter them as df[df["label"] == "unicorn"].

So is there a way to easily create a DataFrame inside this DataFrame or some other cool trick out there?

2
  • 3
    DataFrames aren't really mean to store complex objects. Like you've realized you lose a lot of the inherent functionality, even with something as simple as a list. Storing a DataFrame inside of a DataFrame doesn't really fix this. In this case you should probably consider reshaping your data, perhaps in a long format with a MultiIndex Commented Sep 5, 2019 at 20:57
  • I'll give a try to the MultiIndex, thanks Commented Sep 8, 2019 at 10:10

1 Answer 1

1

if your labels are just nested lists, you can do this:

df[df['LABELS'].apply(lambda x: 'unicorn' in [item for sublist in x for item in sublist])]

this flattens the sublists into a single list within the lambda function, then checks if it contains 'unicorn', masks the df and finally returns the filtered df

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.