1

So I want to filter a column of lists which should only contain specific items.

This my original table:

id code
1 [Hes3086, Hes3440, Hes3220]
2 [Hes3440, Nee8900]
3 [Hes1337, Hes3440]
4 [Nee8900, Hes3440]
5 [Hes1337, Nee8900]
6 [Hes3220, Nee8900]
7 [Hes3220, Nee8900, Hes3440]

I want the rows which only have specific items in the lists: Hes3440, Nee8900, Hes3220

Which should generate the following output:

id code
2 [Hes3440, Nee8900]
4 [Nee8900, Hes3440]
6 [Hes3220, Nee8900]
7 [Hes3220, Nee8900, Hes3440]

I am able to filter the dataset by making sure that at least one of the desired items is in each row, but this is not what I want.

Would appreciate any help!

thanks, M

1 Answer 1

1

Use issubset in boolean indexing with Series.map:

L = ['Hes3440','Nee8900','Hes3220']

df = df[df.code.map(lambda x: set(x).issubset(L))]
print (df)
   id                         code
1   2           [Hes3440, Nee8900]
3   4           [Nee8900, Hes3440]
5   6           [Hes3220, Nee8900]
6   7  [Hes3220, Nee8900, Hes3440]

List comprehension alternative:

df = df[[set(x).issubset(L) for x in df.code]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.