0

I have a data frame with one column. Each value in this column is a list. For example,

     A
0   [1, 3, 4]
1   [43, 1, 42]
2   [50, 3]

I want to perform the set intersection operation between each list to find common elements and produce a data frame as below.

    0           1           2 
0   [1, 2, 3]   [1]         [3]
1   [1]         [43, 1, 42] []
2   [3]         []          [50, 3]

Is there an elegant way of doing this rather than looping over?

1 Answer 1

1

We can apply set to convert all values in A to set then broadcast set intersection:

import pandas as pd

df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})

# Convert to set
a = df['A'].apply(set).values
# Broadcast set intersection
new_df = pd.DataFrame(a[:, None] & a)

new_df:

           0            1        2
0  {1, 3, 4}          {1}      {3}
1        {1}  {1, 42, 43}       {}
2        {3}           {}  {50, 3}

Or np.vectorize can be used to convert to list if needed (it can also be used to convert to set instead of apply):

import numpy as np
import pandas as pd

df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})

# Convert to set (using vectorize instead of apply):
a = np.vectorize(set, otypes=['O'])(df['A'])
# Broadcast set intersection and convert back to list
new_df = pd.DataFrame(
    np.vectorize(list, otypes=['O'])(a[:, None] & a)
)

new_df:

           0            1        2
0  [1, 3, 4]          [1]      [3]
1        [1]  [1, 42, 43]       []
2        [3]           []  [50, 3]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.