0

I have two pandas df as below:-

df1

Type      season    name        qty
Fruit     summer    Mango        12
Fruit     summer    watermelon   23
Fruit     summer    blueberries  200
vegetable summer    Peppers      24


df2

Availability       season          name      city
  YEs              summer          Mango     Pune
  Yes              summer          Peppers   Mumbai
  Yes              summer          Tomatoes  Mumbai    

I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1. In this case like below.

df1
Type       season    name        qty   status
Fruit      summer    Mango        12     1
Fruit      summer    watermelon   23     0
Fruit      summer    blueberries  200    0
vegetable  summer    Peppers      24     1

2 Answers 2

4

Here's another option using merge with how='left':

df1.merge(
    df2[['season', 'name']].assign(status=1),
    how='left').fillna(0)

Output:

        Type  season         name  qty  status
0      Fruit  summer        Mango   12     1.0
1      Fruit  summer   watermelon   23     0.0
2      Fruit  summer  blueberries  200     0.0
3  vegetable  summer      Peppers   24     1.0
Sign up to request clarification or add additional context in comments.

Comments

0

You can use .isin in the following way:

df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12    True
1      Fruit  summer   watermelon   23   False
2      Fruit  summer  blueberries  200   False
3  vegetable  summer      Peppers   24    True

Performance (vs. @perl's answer)

data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
 'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
 'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
 'qty': {0: 12, 1: 23, 2: 200, 3: 24}}

#@perl's answer
%%timeit 
df1 = pd.DataFrame(data) 
df1.merge( 
     df2[['season', 'name']].assign(status=1), 
     how='left').fillna(0)
                                                                       
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))

#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Old (and wrong) answer

You can use .isin with .to_dict:

cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12       1
1      Fruit  summer   watermelon   23       0
2      Fruit  summer  blueberries  200       0
3  vegetable  summer      Peppers   24       1

3 Comments

This is not a correct answer, and IMO should not be accepted. Set in df1 Mango season to 'winter' and in df2 Peppers season to 'winter', and you should not get any matches, i.e. the result should be status = 0 for all rows, while this method returns status = 1 for both Mangos and Peppers
@perl I didn't realize, I'm so sorry. I updated it and added a new approach that actually works
OK, sure. Regarding performance, it's kind of useless to test it on a sample with 4 rows. Hardly anyone will care about the difference in execution time between 5ms and 0.4ms. Of course, it is more important on large datasets, but if you run it on larger samples, merge performs better than isin(list(...))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.