3

I have a DataFrame in which one column has lists as entries. For a given given value x I want to get a pd.Series of booleans telling me whether x is in each list. For example, given the DataFrame

    index    lists
    0        []
    1        [1, 2]
    2        [1]
    3        [3, 4]

I want to do something like df.lists.contains(1) and get back False, True, True, False.

I am aware I can do this with a Python loop or comprehension, but I would ideally like a Pandas solution analogous to df.mod, df.isin etc.

1 Answer 1

8
In [79]: df['lists'].apply(lambda c: 1 in c)
Out[79]:
0    False
1     True
2     True
3    False
Name: lists, dtype: bool

PS I think a list comprehension solution might be faster in this case

Timing for 40.000 rows DF:

In [81]: df = pd.concat([df] * 10**4, ignore_index=True)

In [82]: df.shape
Out[82]: (40000, 2)

In [83]: %timeit df['lists'].apply(lambda c: 1 in c)
22.5 ms ± 87.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [84]: %timeit [1 in x for x in df['lists']]
4.87 ms ± 25.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

Oh! I forgot about apply somehow -- how stupid of me! Why would the list comprehension be faster?
@Alex, apply is bit optimized for ... loop under the hood, so often list comprehension is faster compared to DataFrame.apply(...)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.