detect overlapping values in 2 dataframes

Question

df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"], 
                              ["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]})

and I want to add a column to df1 indicating that the fields overlap with other fields, sample output:

|fields| overlap?|
|------|---------|
|boy   |Y
|apple |N
|toy   |N
|orange|Y
|bear  |N
|eat   |N
|orange|Y
|girl  |Y
|red   |N

first I will explode fields on df1, but I am not sure what the next steps are to check overlap values between 2 dataframes. Thanks!

Why is the first orange in df1 a 'Y' for overlap, but the second is not? — Emi OB
– Emi OB, Commented Sep 15, 2022 at 10:19

sophocles · Accepted Answer · 2022-09-15 13:02:16Z

3

You can also do it without apply. As you said you can explode, then using isin you can check whether values exist in df2 which will return True / False and then mapping 'Y' / 'N' on that:

df1_exp = df1.explode('fields',ignore_index=True)
df1_exp['overlap'] = df1_exp['fields'].isin(df2['other fields']).map({True:'Y',False:'N'})


   fields overlap
0     boy       Y
1   apple       N
2     toy       N
3  orange       Y
4    bear       N
5     eat       N
6  orange       Y
7    girl       Y
8     red       N

edited Sep 15, 2022 at 13:02

answered Sep 15, 2022 at 10:31

sophocles

13.9k3 gold badges18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Guy · Accepted Answer · 2022-09-15 10:31:45Z

2

You can use isin to find overlapping values after exploding both dfs and change the bool to Y/N using np.where

df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"], ["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]]})

df1 = df1.explode('fields', ignore_index=True)
df1['overlap'] = np.where(df1['fields'].isin(df2['other fields'].explode()), 'Y', 'N')
print(df1)

Output

   fields overlap
0     boy       Y
1   apple       N
2     toy       N
3  orange       Y
4    bear       N
5     eat       N
6  orange       Y
7    girl       Y
8     red       N

answered Sep 15, 2022 at 10:31

Guy

51.2k10 gold badges49 silver badges96 bronze badges

Comments

Lucas M. Uriarte · Accepted Answer · 2022-09-15 10:29:13Z

1

this should work

df1 = df1.explode("fields")
df1["overlap"] = df1["fields"].apply(lambda x: "Y" if x in df2["other fields"].values else "N")

    fields  overlap
0   boy     Y
0   apple   N
0   toy     N
0   orange  Y
0   bear    N
0   eat     N
1   orange  Y
1   girl    Y
1   red     N

answered Sep 15, 2022 at 10:29

Lucas M. Uriarte

3,17611 silver badges25 bronze badges

1 Comment

Emi OB Over a year ago

You've got a Y for both entries of orange, only the first has one in the example. OP hasn't clarified if this is desired or a typo

gtomer · Accepted Answer · 2022-09-15 10:32:10Z

1

You can try .isin():

df1 = df1.explode("fields")
df1["overlap"] = df1["fields"].isin(df2["other fields"][0])

You can later replace the True/False with Y/N

answered Sep 15, 2022 at 10:32

gtomer

6,6041 gold badge14 silver badges29 bronze badges

Comments

Anoushiravan R · Accepted Answer · 2022-09-15 10:51:32Z

1

Another way is using np.select. I normally use it for huge data sets where some methods may take a while to be executed:

df1 = df1.explode(column='fields')
df1['overlap'] = np.select([df1.fields.isin(df2['other fields'])], ['Y'], 'N')

   index  fields overlap
0      0     boy       Y
1      0   apple       N
2      0     toy       N
3      0  orange       Y
4      0    bear       N
5      0     eat       N
6      1  orange       Y
7      1    girl       Y
8      1     red       N

answered Sep 15, 2022 at 10:51

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

Collectives™ on Stack Overflow

detect overlapping values in 2 dataframes

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related