2

I have a Pandas Series of lists of arbitary length:

s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])

and a list of elements

l = [1,2,3,6,7,8]

I want to return all elements of the series s which has all values contained in l, otherwise None. I want to do something like this but apply it to each element in the series:

s.where(s.isin(l), None)

So the output would be a series:

pd.Series([[1,2,3], None, None])
2
  • are all lists in the series equal length? Commented Sep 15, 2021 at 20:53
  • Edited to reflect, no. Commented Sep 15, 2021 at 20:55

5 Answers 5

3

You can use the magic of python sets:

s.apply(set(l).issuperset)

Output:

0     True
1    False
2    False
dtype: bool

Then use where to modify the non matching rows using the previous output as mask:

s.where(s.apply(set(l).issuperset), None)

Output:

0    [1, 2, 3]
1         None
2         None
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

3

you can explode the series, use isin with l and use all with the parameter level=0 (equivalent to groupby.all on the index).

print(s.explode().isin(l).all(level=0))
0     True
1    False
2    False
dtype: bool

use this Boolean mask in where to get your expected result

s1 = s.where(s.explode().isin(l).all(level=0), None)
print(s1)
0    [1, 2, 3]
1         None
2         None
dtype: object

Thanks to a comment of @mozway, the parameter level=0 in all is being deprecated, so the solution would be with groupby.all

s1 = s.where(s.explode().isin(l).groupby(level=0).all(), None)

1 Comment

NB. all(level=0) is getting deprecated, the new syntax is groupby(level=0).all() ;)
0

@TomNash, you can combine all function with listcomprehension:

s = pd.Series([[1,2,3], [4,5,6], [7,8,9]])

l = [1,2,3,6,7,8]

final_list = []
for x in s:
    if all(item in l for item in x):
        final_list.append(x)
    else:
        final_list.append(None)

print(final_list)

OUTPUT:

[[1, 2, 3], None, None]

Comments

0
s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])
l = [1,2,3,6,7,8]
new_series = []
for i in range(len(s)):
    s_in_l = 0
    for j in range(len(s[i])):
        if s[i][j] not in l:
            s_in_l = s_in_l + 1
    if s_in_l == 0:
        new_series.append(s[i])
    else:
        new_series.append(None)
new_series = pd.Series(new_series)
print(new_series)

output:

0    [1, 2, 3]
1         None
2         None
dtype: object

1 Comment

what's the performance penalty of this solution versus the upvoted ones?
0

You can check the element of s is subset of l by .issubset function, as folllows:

s.apply(lambda x: x if set(x).issubset(l) else None)

or make use of numpy function setdiff1d, as follows:

s.apply(lambda x: x if (len(np.setdiff1d(x, l)) == 0) else None)

Result:

0    [1, 2, 3]
1         None
2         None
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.